The goal of this Notebook is to take the parts of NovemberPhyloProc.ipynb that I expect to go into the actual manuscript.
Figure 1. Antibodies over time.
Figure 2. Weighted unifrac PCoA
Table 1. Kernel regression and weighted unifrac 1 test - binomial
Table S1. Kernel regression and weighted unifrac 1 test - gaussian
Table S2. Weighted unifrac components 2-N
Figure S1. Jensen Shannon at different agglomeration levels.
Figure S2. Statistically significant (p< 0.05, q < 0.2) family genus and species abundances (clr transformed) regressed against weighted unifrac 1.
Table S3. All family - genus and species vs antibody vs glm scores.
Figure 3. Stacked bars of key groups ordered by weighted unifrac axis 1.
Figure S3. Groups associated with IgGs.
Figure 4. Proportionality heat-map
3/20/2017 Re-ran whole pipeline end-to-end. Hits on everything (including igg gp41 0 day) except only trending on gp41 Month 6.5. However I don’t see family level groups for gp41 that relate to community structure (q < 0.2, p < 0.05). I may just try running everything end to end a few more times to see what happens. This because I want to see how consistant (or otherwise) the results are between runs. Also, I’m going to start setting seeds now.
# only use library paths in the anaconda environment
#.libPaths(grep('anaconda3', .libPaths(), value = T))
.libPaths()
## [1] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib/x86_64-pc-linux-gnu/3.6.0"
## [2] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-ext"
## [3] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-R"
R.version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status beta
## major 3
## minor 6.0
## year 2019
## month 04
## day 11
## svn rev 76379
## language R
## version.string R version 3.6.0 beta (2019-04-11 r76379)
## nickname Planting of a Tree
sessionInfo()
## R version 3.6.0 beta (2019-04-11 r76379)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.0 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
## [5] htmltools_0.3.6 tools_3.6.0 yaml_2.2.0 Rcpp_0.12.19
## [9] stringi_1.2.4 rmarkdown_1.10 knitr_1.20 stringr_1.3.1
## [13] digest_0.6.18 packrat_0.4.8-1 evaluate_0.12
# https://stackoverflow.com/questions/46354826/have-a-function-that-calls-library-and-takes-either-a-package-or-its-name-as-inp
# Also return package version when loading in packages
# accept strings or functions
libver <- function(pac){
pac <- as.character(substitute(pac))
library(pac, character.only=TRUE)
packageVersion(pac)
}
#libver("dada2")
#libver("ggplot2")
libver("Cairo")
## [1] '1.5.9'
# Much of the data handling
libver('phyloseq')
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## Registered S3 method overwritten by 'dplyr':
## method from
## as.data.frame.tbl_df tibble
## [1] '1.22.3'
# A bunch of environments, including ggplot, dplyr, tidyr, and broom, which I use a lot
libver('tidyverse')
## Registered S3 method overwritten by 'rvest':
## method from
## read_xml.response xml2
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.7
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## [1] '1.2.1'
# Mostly for concatenating ggplots
library(gridExtra); packageVersion("gridExtra")
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
## [1] '2.3'
# I use this surprisingly not a lot here.
library(vegan); packageVersion("vegan")
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-3
## [1] '2.5.3'
# For making trees
# libver('phangorn')
# A prerequesite to phangorn
# libver("DECIPHER")
# Some pre-processing stuff
# libver("dada2")
# I usually reshape with tidyverse tools now, but melt and cast are often easier in a pinch
# libver("reshape2")
# For replacing NaNs without too much thought.
# libver("imputeMissings")
# Deal with proportional data, especially useful for calculating proportionality phi
libver('compositions')
## Loading required package: tensorA
##
## Attaching package: 'tensorA'
## The following object is masked from 'package:base':
##
## norm
## Loading required package: robustbase
## Loading required package: energy
## Loading required package: bayesm
## Welcome to compositions, a package for compositional data analysis.
## Find an intro with "? compositions"
##
## Attaching package: 'compositions'
## The following objects are masked from 'package:stats':
##
## cor, cov, dist, var
## The following objects are masked from 'package:base':
##
## %*%, scale, scale.default
## [1] '1.40.2'
# Works with tidyverse to make model output tidy
libver('broom')
## [1] '0.5.0'
# Make pretty tables
libver('knitr')
## [1] '1.20'
libver('kableExtra')
## [1] '0.9.0'
# Let those pretty tables actually show up in a jupyter notebook
#library('IRdisplay')
# For bootstrapping
libver('boot')
##
## Attaching package: 'boot'
## The following object is masked from 'package:robustbase':
##
## salinity
## The following object is masked from 'package:lattice':
##
## melanoma
## [1] '1.3.20'
# Calculate kernel regressions
libver("MiRKAT")
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
##
## aml
## The following object is masked from 'package:robustbase':
##
## heart
## Loading required package: PearsonDS
## Loading required package: GUniFrac
## Loading required package: ape
##
## Attaching package: 'ape'
## The following object is masked from 'package:compositions':
##
## balance
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:robustbase':
##
## colMedians, rowMedians
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## [1] '1.0.1'
libver("car")
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:boot':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## [1] '3.0.2'
#libver(mclust)
#libver(chemometrics)
libver(purrrlyr)
## [1] '0.0.3'
libver('qvalue')
## [1] '2.10.0'
libver("breakaway")
## [1] '4.6.11'
I have put the functions in a library file
source('libraries/library096.R')
# Set upOriginal to false, if you want to used user-reprocessed data.
# Results may differ slightly from those in the manuscript due to inter-run variation
# especially in the tree-ing algorithm.
upOriginal <- TRUE
# upOriginal <- FALSE
# For permutation tests, how fast do things need to run
# 9999 for most runs, 99999 for publication quality ones suggested
jnperm <- 99999
# Data paths
getwd()
## [1] "/data/users/cram/Project/Nyvac_096_Microbiome"
(mapping_file_path <- file.path('data', 'mapping_file_096a.csv'))
## [1] "data/mapping_file_096a.csv"
(immune_file_path <- file.path('data', 'immune096b.csv'))
## [1] "data/immune096b.csv"
if(upOriginal){
seqtab_file_path <- file.path('data', 'seqtab.nochimNov2017.csv')
taxa_file_path <- file.path('data', 'TaxaNov2017.csv')
tree_path <- file.path('data', 'phylogeny096NovTree.tre')
} else {
seqtab_file_path <- file.path('data1', 'seqtab.nochimMar2018.csv')
taxa_file_path <- file.path('data1', 'TaxaMar2018.csv')
tree_path <- file.path('data1', 'phylogeny096Mar2018tre.tre')
}
seqtab_file_path
## [1] "data/seqtab.nochimNov2017.csv"
taxa_file_path
## [1] "data/TaxaNov2017.csv"
tree_path
## [1] "data/phylogeny096NovTree.tre"
# Sequence data
seqtab.nochim.data <- read.csv(seqtab_file_path)
seqtabNames = gsub('\\.', '-',
gsub('.fastq', '', seqtab.nochim.data$X)
)
seqtab.nochim = as.matrix(seqtab.nochim.data[,-1])
rownames(seqtab.nochim) = seqtabNames
# Taxa names
taxa.data <- read.csv(taxa_file_path)
taxa = taxa.data[,-1]
## I reverse complemented the sequences to generate the taxonomy
# (but only in this latest re-run, not the original)
## The following undoes that reverse complement to get original sequence
#rownames(taxa) = dada2:::rc(taxa.data[,1])
if(upOriginal){
rownames(taxa) = (taxa.data[,1])} else {
rownames(taxa) = dada2:::rc(taxa.data[,1])
}
taxa <- as.matrix(taxa)
# Mapping file
mapping.data <- read_csv(mapping_file_path) %>%
mutate(pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
## Parsed with column specification:
## cols(
## SampleID = col_character(),
## BarcodeSequence = col_character(),
## LinkerPrimerSequence = col_character(),
## ReversePrimer = col_character(),
## run_prefix = col_character(),
## pub_id = col_character(),
## Sex = col_character(),
## Visit = col_integer(),
## visitRank = col_integer(),
## RXCode = col_character(),
## Description = col_character()
## )
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
#mapping = mapping.data[,-1]
#rownames(mapping) = mapping.data[,1]
#mapping <- as.matrix(mapping)
head(mapping.data)
## # A tibble: 6 x 11
## SampleID BarcodeSequence LinkerPrimerSeq… ReversePrimer run_prefix pub_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Sample-… TTACGC GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 282
## 2 Sample-… TTAGGT GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 282
## 3 Sample-… TTCCAC GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 176
## 4 Sample-… TTGTAC GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 176
## 5 Sample-… TTGTGT GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 123
## 6 Sample-… TATCAC GCGGACTACCVGGGT… GCACTCCTRCGG… JH3VWZ201 49
## # ... with 5 more variables: Sex <chr>, Visit <int>, visitRank <int>,
## # RXCode <chr>, Description <chr>
# Immune Data
immune.data0 <- read_csv(immune_file_path)
## Parsed with column specification:
## cols(
## visitno = col_integer(),
## rx_code = col_character(),
## type = col_character(),
## antigen = col_character(),
## mag = col_double(),
## mag_bl = col_double(),
## response = col_integer(),
## day = col_integer(),
## month = col_double(),
## ct = col_character(),
## response_j = col_double(),
## assay = col_character(),
## pub_id = col_character()
## )
immune.data <- mutate(immune.data0, pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
levels(immune.data$antigen) <- gsub("[ /]", ".", levels(immune.data$antigen))
head(immune.data)
## # A tibble: 6 x 13
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 5 CTRL IgA gp41 180. 214. 0 42 1.5 CTRL
## 2 7 CTRL IgA gp41 175. 214. 0 98 3.5 CTRL
## 3 9 CTRL IgA gp41 177. 214. 0 182 6.5 CTRL
## 4 12 CTRL IgA gp41 173 214. 0 364 12 CTRL
## 5 5 CTRL IgA p24 447. 454. 0 42 1.5 CTRL
## 6 7 CTRL IgA p24 767. 454. 0 98 3.5 CTRL
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
# Phylogenetic tree
seqs <- dada2::getSequences(seqtab.nochim)
names(seqs) <- seqs
pt <- ape::read.tree(file=tree_path)
pt2 <- phangorn::midpoint(pt)
immune.data$antigen %>% unique
## [1] "gp41" "p24" "Con.6.gp120.B"
## [4] "ZM96.gp140" "gp70_B.CaseA_V1_V2" "ANY.ENV.PTEG"
Save options to a variable
par0 <- options()
## minimal sample identification data
pub_id_key <- unique(immune.data[,c("pub_id", "rx_code", "ct")])
sample_sm0 <- dplyr::select(mapping.data, SampleID, pub_id, sex = Sex, muVisit = Visit, muVisitRank = visitRank)
sample_sm <- left_join(sample_sm0, pub_id_key, by = "pub_id") %>%
as.data.frame %>%
tibble::column_to_rownames(var = "SampleID")
## Warning: `chr_along()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
## Warning: Column `pub_id` has different attributes on LHS and RHS of join
# rownames(sample_sm)
# head(sample_sm)
# Make raw phyloseq object
ot <- otu_table(seqtab.nochim, taxa_are_rows=FALSE)
tt <- tax_table(taxa)
dimnames(tt) = dimnames(taxa)
spl <- sample_data(sample_sm)
psN is a really raw phyloseq object * OTU names are given as accession numbers * Numbers are in total counts * We have samples from both time points
# Quite raw phyloseq object. Species names are given as accession numbers
psN <- phyloseq(ot, tt, spl, pt2)
psN
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 960 taxa and 47 samples ]
## sample_data() Sample Data: [ 47 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
I want to make a phyloseq object for use in essentally all of the subsequent analyses. Features include: * Some basic taxonomic pre-processing. * No uncharacterized phyla. * Only OTUs that show up at least 10% of the time in the final data set . * Do this after filtering samples. * No tip-glomming. I’ll save that untill later. * Immune data is included in the sample data table. * We’ll do Andrew’s representitive IgGs and IgAs. * We only have samples from visit 1. * We only have samples from experemental (not control) groups.
immune.data %>% pull(type) %>% unique
## [1] "IgA" "IgG" "CD4+"
#immune.data %>% unite(type_antigen ,type, antigen, sep = "_")
immune.data %>% dplyr::select(pub_id, month, type, antigen, mag) %>%
filter(month %in% c(0, 6.5, 12)) %>%
unite(type_antigen, type, antigen, sep = "_") %>%
unite(type_antigen_month,type_antigen, month, sep = "_Month_") %>%
spread(key = type_antigen_month, value = mag, drop = TRUE) -> immune.table
immune.table %>% head
## # A tibble: 6 x 25
## pub_id `CD4+_ANY.ENV.P… `CD4+_ANY.ENV.P… `CD4+_ANY.ENV.P…
## <dbl> <dbl> <dbl> <dbl>
## 1 3 0.025 NA 0.0678
## 2 4 0.025 NA NA
## 3 5 0.025 0.025 0.025
## 4 7 0.025 0.025 0.025
## 5 12 0.025 0.025 0.0515
## 6 14 0.025 0.025 0.302
## # ... with 21 more variables: IgA_gp41_Month_0 <dbl>,
## # IgA_gp41_Month_12 <dbl>, IgA_gp41_Month_6.5 <dbl>,
## # IgA_p24_Month_0 <dbl>, IgA_p24_Month_12 <dbl>,
## # IgA_p24_Month_6.5 <dbl>, IgG_Con.6.gp120.B_Month_0 <dbl>,
## # IgG_Con.6.gp120.B_Month_12 <dbl>, IgG_Con.6.gp120.B_Month_6.5 <dbl>,
## # IgG_gp41_Month_0 <dbl>, IgG_gp41_Month_12 <dbl>,
## # IgG_gp41_Month_6.5 <dbl>, IgG_gp70_B.CaseA_V1_V2_Month_0 <dbl>,
## # IgG_gp70_B.CaseA_V1_V2_Month_12 <dbl>,
## # IgG_gp70_B.CaseA_V1_V2_Month_6.5 <dbl>, IgG_p24_Month_0 <dbl>,
## # IgG_p24_Month_12 <dbl>, IgG_p24_Month_6.5 <dbl>,
## # IgG_ZM96.gp140_Month_0 <dbl>, IgG_ZM96.gp140_Month_12 <dbl>,
## # IgG_ZM96.gp140_Month_6.5 <dbl>
Some investegation suggested by the phyloseq tutorials to identify phyla for removal, and to identify an abundance threshold
psN
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 960 taxa and 47 samples ]
## sample_data() Sample Data: [ 47 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
psN %>% subset_samples(!is.na(pub_id))
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 960 taxa and 38 samples ]
## sample_data() Sample Data: [ 38 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
# skip the blanks
psN %>% subset_samples(!is.na(pub_id)) %>%
# OTUs must be characterized to phylum
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) -> psN_hasPhylum
psN_hasPhylum
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 929 taxa and 38 samples ]
## sample_data() Sample Data: [ 38 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 929 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 929 tips and 928 internal nodes ]
# from 960 to 929 otus
Identifying and removing phyla with very few taxa in them
prevdf = apply(X = otu_table(psN_hasPhylum),
MARGIN = ifelse(taxa_are_rows(psN_hasPhylum), yes = 1, no = 2),
FUN = function(x){sum(x > 0)})
# Add taxonomy and total read counts to this data.frame
prevdf = data.frame(Prevalence = prevdf,
TotalAbundance = taxa_sums(psN_hasPhylum),
tax_table(psN_hasPhylum))
plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})
## Phylum 1 2
## 1 Actinobacteria 7.953846 517
## 2 Bacteroidetes 9.414634 1930
## 3 Elusimicrobia 2.000000 2
## 4 Firmicutes 9.711604 5691
## 5 Fusobacteria 5.875000 94
## 6 Proteobacteria 6.960000 348
## 7 Synergistetes 3.750000 15
## 8 Tenericutes 4.000000 4
## 9 Verrucomicrobia 6.000000 6
filterPhyla = c("Verrucomicrobia", "Tenericutes", "Elusimicrobia", "Synergistetes")
psN_MainPhyla = subset_taxa(psN_hasPhylum, !Phylum %in% filterPhyla)
psN_MainPhyla
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 922 taxa and 38 samples ]
## sample_data() Sample Data: [ 38 samples by 6 sample variables ]
## tax_table() Taxonomy Table: [ 922 taxa by 8 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 922 tips and 921 internal nodes ]
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_log10() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
## Warning: Transformation introduced infinite values in continuous x-axis
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_continuous() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
Make a phyloseq object like psN2 but with all participants, psN2A Code copied from above.
(phyloseq object of relative abundances)
And psN1 (phyloseq object of counts)
psN1A and psN2A include all participants, and will be used to look at variability within participants
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# only use data from humans (no extraction controls)
subset_samples(is.finite(muVisitRank)) %>%
# only otus from known taxa that show up frequently enough
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
# only otus that show up in at least 10% of samples
prevalence_filter_taxa %>%
# convert to relative abundance
tag_phyloseq%>%
# Instead of naming each taxon with its full sequence, we use the "tag" instead
swap.phyloseq.taxnames %>%
pass -> psN1A # Save pre relative abundance transformation
## Warning: `list_len()` is deprecated as of rlang 0.2.0.
## Please use `new_list()` instead.
## This warning is displayed once per session.
## Warning: Setting row names on a tibble is deprecated.
# add is-male
manColumn <- psN1A %>% sample_data %>% as('data.frame') %>% rownames_to_column %>% mutate(isMale = testIsMaleVec(sex)) %>% dplyr::select(rowname, isMale)
psN1A <- phylo_join(psN1A, manColumn, by = 'rowname')
## psN2 is like psN1 but with relative abundances
psN1A %>%
transform_sample_counts(function(x) {x/sum(x)}) %>%
# The "tag" is a new name that takes into account the rest of the taxonomy data
# the tag may need to be updated after any agglomeration
pass-> psN2A
# filter to just microbiome visit 1 and experemental treatments
psN1A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN1
psN2A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN2
# Calculate weighted unifrac distances and role those in.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
psN2.pcoa.df <- psN2.pcoa %>% scores(display = "sites") %>%
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2') %>%
mutate(rMDS1 = rank(MDS1)) %>% # rank order of MDS1
mutate(rrMDS1 = formatC(format = "d", rMDS1, flag = "0", width=ceiling(log10(max(rMDS1))))) %>%
unite(newname, rrMDS1, rowname, sep = "_", remove = FALSE) %>%
dplyr::select(-rrMDS1)
psN2 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN2
## Even if the data are counts,
## the weighted unifrac pcoa is still done on the relative abundances
psN1 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN1
psN2A
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 651 taxa and 38 samples ]
## sample_data() Sample Data: [ 38 samples by 31 sample variables ]
## tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1A
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 651 taxa and 38 samples ]
## sample_data() Sample Data: [ 38 samples by 31 sample variables ]
## tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN2
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 651 taxa and 21 samples ]
## sample_data() Sample Data: [ 21 samples by 35 sample variables ]
## tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 651 taxa and 21 samples ]
## sample_data() Sample Data: [ 21 samples by 35 sample variables ]
## tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
How many taxa were still present after each filtering step?
# Find number of taxa in available samples
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# filter to just microbiome visit 1 and experemental treatments
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
prevalence_filter_taxa(thresh = 0) %>%
pass-> psInSamples
(NInSamples <- dim(otu_table(psInSamples))[2])
## [1] 960
# Number of taxa with unidentified phyla
psInSamples %>%
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
pass -> psIdentifiedPhylum
(NUnkPhylum <- NInSamples - dim(otu_table(psIdentifiedPhylum))[2])
## [1] 31
filterPhyla
## [1] "Verrucomicrobia" "Tenericutes" "Elusimicrobia" "Synergistetes"
# Phyla removed because they are in filterPhyla
# -- each of which show up fewer than 20 times in the data set
psIdentifiedPhylum %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
pass -> psNotPhylaFiltered
(NFiltPhyla <- dim(otu_table(psIdentifiedPhylum))[2] - dim(otu_table(psNotPhylaFiltered))[2])
## [1] 7
# Taxa removed because there were in fewer than 10% of the samples
psNotPhylaFiltered %>%
prevalence_filter_taxa %>%
pass -> psPFT
dim(otu_table(psNotPhylaFiltered))[2] - dim(otu_table(psPFT))[2]
## [1] 386
How to participants’ immune profiles change over time?
# When were participants vaccinated?
# Copied from protocol apendix E
# visitno 1 is a screening visit, I assign it NaN
dayTable = data.frame(
visitno = seq(from = 1, to = 14, by = 1),
day = c(NaN, 0, 14, 28, 42, 84, 98, 168, 182, 196, 273, 364, 455, 545),
month = c(NaN, 0, 0.5, 1, 1.5, 3, 3.5, 6, 6.5, 7, 9, 12, 15, 18)
)
vac <- data.frame(
visitno = c(2, 4, 6, 8)
)
vac <- left_join(vac, dayTable, by = 'visitno')
vac
## visitno day month
## 1 2 0 0
## 2 4 28 1
## 3 6 84 3
## 4 8 168 6
# Representitive antigens for further considerations
# These are essentially zero (mag = 1) at baseline
ants1 <- c('Con.6.gp120.B', 'ZM96.gp140', 'gp70_B.CaseA_V1_V2')
# These have measurable baseline magnitudes
ants2 <- c('gp41', 'p24')
donor.immune <- psN2 %>% sample_data %>% as('data.frame') %>% dplyr::select(pub_id) %>%
left_join(immune.data, by = 'pub_id')
## Warning: Column `pub_id` has different attributes on LHS and RHS of join
donor.immune %>% head
## pub_id visitno rx_code type antigen mag mag_bl response day month ct
## 1 282 5 T1 IgA gp41 352.00 109.5 0 42 1.5 T
## 2 282 7 T1 IgA gp41 276.50 109.5 0 98 3.5 T
## 3 282 9 T1 IgA gp41 333.25 109.5 0 182 6.5 T
## 4 282 12 T1 IgA gp41 1.00 109.5 0 364 12.0 T
## 5 282 5 T1 IgA p24 313.50 329.8 0 42 1.5 T
## 6 282 7 T1 IgA p24 377.50 329.8 0 98 3.5 T
## response_j assay
## 1 -0.18339353 BAMA
## 2 -0.10360764 BAMA
## 3 0.19882813 BAMA
## 4 0.15338760 BAMA
## 5 0.02648181 BAMA
## 6 -0.10988643 BAMA
psN %>% sample_data %>%
as('data.frame') %>%
filter(!is.na(pub_id)) %>%
pull(pub_id) %>%
unique %>%
pass -> microbiomeCohort
immune.data %>% filter(pub_id %in% microbiomeCohort) %>%
pass -> donor.immune
donor.immune %>% head
## # A tibble: 6 x 13
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 5 T1 IgA gp41 352 110. 0 42 1.5 T
## 2 7 T1 IgA gp41 276. 110. 0 98 3.5 T
## 3 9 T1 IgA gp41 333. 110. 0 182 6.5 T
## 4 12 T1 IgA gp41 1 110. 0 364 12 T
## 5 5 T1 IgA p24 314. 330. 0 42 1.5 T
## 6 7 T1 IgA p24 378. 330. 0 98 3.5 T
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
options(par0)
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
ggsave('figures/useiggsAllParticipants.svg')
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.
iggplot
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type %in% c('IgA', 'CD4+') & antigen %in% c(ants2, 'ANY.ENV.PTEG'))%>%
mutate(antigen = factor(antigen, levels = c(ants2, 'ANY.ENV.PTEG'))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen(), scales = 'free_y') +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^c(
seq(from = -2, to = 0, by = 0.25), seq(from = 0, to = 5, by = 1)
), labels = function(x) round(as.numeric(x), digits=3)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useIgACD4AllParticipants.svg')
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.
iggplot <- donor.immune %>% filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id)) + geom_point(alpha = 0.6) + geom_line(alpha = 0.4) +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() + theme(strip.text.y = element_text(angle = 0), axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useiggs.svg') # I can no longer save pngs with transparency, going to svg
## Saving 7 x 5 in image
# To fix. Control groups don't show up in this version.
colnames(immune.data)
## [1] "visitno" "rx_code" "type" "antigen" "mag"
## [6] "mag_bl" "response" "day" "month" "ct"
## [11] "response_j" "assay" "pub_id"
Plan. Calculate mean and variance of each antigen-type at visit 9, and at baseline.
#
geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}
geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id, antigen, visitno, mag, mag_bl, type) %>% group_by(antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl)
## # A tibble: 8 x 8
## # Groups: antigen [6]
## antigen type mean_mag mean_bl sd_mag sd_bl var_over_mean_m…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANY.EN… CD4+ 9.20e-2 NA 2.51 NA 68.7
## 2 Con.6.… IgG 1.14e+4 2.54 8.33 4.64 0.00607
## 3 gp41 IgA 9.87e+1 15.9 6.42 7.04 0.417
## 4 gp41 IgG 5.56e+3 208. 3.76 4.91 0.00254
## 5 gp70_B… IgG 1.98e+3 1.48 7.40 2.66 0.0277
## 6 p24 IgA 2.50e+3 422. 4.02 2.48 0.00646
## 7 p24 IgG 2.55e+4 152. 2.44 6.34 0.000233
## 8 ZM96.g… IgG 1.81e+3 8.83 7.07 9.01 0.0276
## # ... with 1 more variable: var_over_mean_bl <dbl>
Mean variance and variance over mean of each value.
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val")
## # A tibble: 8 x 8
## # Groups: antigen, type [8]
## antigen type mean_bl mean_mag sd_bl sd_mag var_over_mean_bl
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANY.EN… CD4+ NA 9.76e-2 NA 2.37 NA
## 2 Con.6.… IgG 2.60 1.21e+4 4.77 8.59 9.04
## 3 gp41 IgA 17.4 1.06e+2 7.01 6.62 4.30
## 4 gp41 IgG 215. 6.43e+3 4.94 3.51 0.135
## 5 gp70_B… IgG 1.51 1.99e+3 2.61 7.93 4.64
## 6 p24 IgA 428. 3.34e+3 2.51 3.38 0.0147
## 7 p24 IgG 155. 2.56e+4 6.94 2.44 0.466
## 8 ZM96.g… IgG 8.85 1.98e+3 9.47 7.15 10.6
## # ... with 1 more variable: var_over_mean_mag <dbl>
As above, but this time, calculated seperately for each treatment group and then those values averaged. ZM96 and gp70 surprisingly large variance over mean. Maybe should look at delta magnitude
immune.data%>% head
## # A tibble: 6 x 13
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 5 CTRL IgA gp41 180. 214. 0 42 1.5 CTRL
## 2 7 CTRL IgA gp41 175. 214. 0 98 3.5 CTRL
## 3 9 CTRL IgA gp41 177. 214. 0 182 6.5 CTRL
## 4 12 CTRL IgA gp41 173 214. 0 364 12 CTRL
## 5 5 CTRL IgA p24 447. 454. 0 42 1.5 CTRL
## 6 7 CTRL IgA p24 767. 454. 0 98 3.5 CTRL
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
immune.data %>%
mutate(mag_delta = mag / mag_bl) %>%
mutate(mag_delta = if_else(mag_delta < 1, 1, mag_delta)) %>%
dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, mag_delta, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), mean_delta = geomean(mag_delta), sd_mag = geosd(mag), sd_bl = geosd(mag_bl), sd_delta = geosd(mag_delta)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl, var_over_mean_delta = sd_delta^2/mean_delta) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl, mean_delta, sd_delta, var_over_mean_delta) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val") %>%
arrange(type)
## # A tibble: 8 x 11
## # Groups: antigen, type [8]
## antigen type mean_bl mean_delta mean_mag sd_bl sd_delta sd_mag
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANY.EN… CD4+ NA NA 9.76e-2 NA NA 2.37
## 2 gp41 IgA 17.4 8.97 1.06e+2 7.01 5.63 6.62
## 3 p24 IgA 428. 7.27 3.34e+3 2.51 3.89 3.38
## 4 Con.6.… IgG 2.60 5310. 1.21e+4 4.77 13.6 8.59
## 5 gp41 IgG 215. 35.8 6.43e+3 4.94 6.98 3.51
## 6 gp70_B… IgG 1.51 1417. 1.99e+3 2.61 9.60 7.93
## 7 p24 IgG 155. 176. 2.56e+4 6.94 7.87 2.44
## 8 ZM96.g… IgG 8.85 239. 1.98e+3 9.47 13.9 7.15
## # ... with 3 more variables: var_over_mean_bl <dbl>,
## # var_over_mean_delta <dbl>, var_over_mean_mag <dbl>
immune.data %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
## # A tibble: 5 x 2
## rx_code Unique_ids
## <chr> <int>
## 1 CTRL 16
## 2 T1 20
## 3 T2 20
## 4 T3 20
## 5 T4 20
donor.immune %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
## # A tibble: 5 x 2
## rx_code Unique_ids
## <chr> <int>
## 1 CTRL 2
## 2 T1 7
## 3 T2 3
## 4 T3 6
## 5 T4 5
Number of participants with a given day of first sample.
psN2 %>% sample_data %>% data.frame %>% group_by(muVisit) %>% summarize(n = length(muVisit))
## # A tibble: 3 x 2
## muVisit n
## <int> <int>
## 1 2 3
## 2 9 11
## 3 12 7
How many donors were there from each treatment?
psN2 %>% sample_data %>% data.frame %>% group_by(rx_code) %>% summarize(n = length(muVisit))
## # A tibble: 4 x 2
## rx_code n
## <chr> <int>
## 1 T1 7
## 2 T2 3
## 3 T3 6
## 4 T4 5
Breakdown by both visit and treatment
sampleBreakdown <- psN2 %>% sample_data %>% data.frame %>% group_by(muVisit, rx_code) %>% summarize(n = length(muVisit)) %>% spread(key = rx_code, value = n, fill = 0) %>% mutate(total = T1 + T2 + T3 + T4)
sbcs <- colSums(sampleBreakdown)
sampleBreakdown <- bind_rows(sampleBreakdown, sbcs)
sampleBreakdown %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
#kable_styling("striped", "hover", full_width = F) %>%
#collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> sampleBreakdown.html
sampleBreakdown.html
| muVisit | T1 | T2 | T3 | T4 | total |
|---|---|---|---|---|---|
| 2 | 3 | 0 | 0 | 0 | 3 |
| 9 | 4 | 2 | 5 | 0 | 11 |
| 12 | 0 | 1 | 1 | 5 | 7 |
| 23 | 7 | 3 | 6 | 5 | 21 |
sampleBreakdown.html %>% cat(file = 'tables/sampleBreakdown.html')
psN2A.wuf <- phyloseq::distance(psN2A, method = "wunifrac")
psN2A %>% sample_data %>% .[1:5, 1:10]
## pub_id sex muVisit muVisitRank rx_code ct
## Sample-57 282 Male 9 1 T1 T
## Sample-58 282 Male 12 2 T1 T
## Sample-60 176 Female 9 1 T1 T
## Sample-61 176 Female 12 2 T1 T
## Sample-62 123 Female 12 1 T4 T
## CD4._ANY.ENV.PTEG_Month_0 CD4._ANY.ENV.PTEG_Month_12
## Sample-57 0.025 0.02500000
## Sample-58 0.025 0.02500000
## Sample-60 0.025 0.06841053
## Sample-61 0.025 0.06841053
## Sample-62 0.025 0.42710324
## CD4._ANY.ENV.PTEG_Month_6.5 IgA_gp41_Month_0
## Sample-57 0.06146979 109.5
## Sample-58 0.06146979 109.5
## Sample-60 0.16678641 149.5
## Sample-61 0.16678641 149.5
## Sample-62 0.70855159 33.5
psN2A %>% sample_data %>% as("data.frame") %>% rownames_to_column(var = "Sample") %>% dplyr::select(Sample, pub_id) %>%
pass -> S2P
All.equal <- Vectorize(function(x,y){x == y})
## Convert distance matrix into long form matrix
psN2A.wuf %>% as.matrix %>% as.data.frame %>%
rownames_to_column(var = "SampleX")%>%
gather(key = "SampleY", value = "WufDist", -SampleX) %>%
left_join(S2P, by = c("SampleX" = "Sample")) %>% rename(pub_id_x = pub_id) %>%
left_join(S2P, by = c("SampleY" = "Sample")) %>% rename(pub_id_y = pub_id) %>%
mutate(SampleX = as.numeric(str_extract(SampleX, "[0-9][0-9]"))) %>%
mutate(SampleY = as.numeric(str_extract(SampleY, "[0-9][0-9]"))) %>%
## discard diagonal discard and upper half of triangular matrix
filter(SampleX < SampleY) %>%
mutate(isSamePerson = All.equal(pub_id_x, pub_id_y)) %>%
# ## discard cases where pub_id is unknown
# filter(is.finite(pub_id_x) & is.finite(pub_id_y))
pass -> AllWufDist
AllWufDist %>% head
## SampleX SampleY WufDist pub_id_x pub_id_y isSamePerson
## 1 57 58 0.3769328 282 282 TRUE
## 2 57 60 0.2754723 282 176 FALSE
## 3 58 60 0.3852072 282 176 FALSE
## 4 57 61 0.4105227 282 176 FALSE
## 5 58 61 0.3384203 282 176 FALSE
## 6 60 61 0.3868663 176 176 TRUE
#AllWufDist %>% ggplot(aes(x = isSamePerson, y = WufDist)) + geom_violin() + geom_dotplot(binaxis = "y", stackdir = "center", binwidth = .005)
Get Mean values for between participant and within participant weighted unifrac distances.
WufMeans <- AllWufDist %>% group_by(isSamePerson) %>% summarize(mean = mean(WufDist))
WufMeans
## # A tibble: 2 x 2
## isSamePerson mean
## <lgl> <dbl>
## 1 FALSE 0.497
## 2 TRUE 0.411
#SameAndDiff <- data.frame(comparison = c("different", "same"), WufMeans$mean)
SameAndDiff <- WufMeans %>% spread(key = isSamePerson, value = mean) %>% rename(between = 'FALSE', within = 'TRUE') %>% mutate(diff = between-within)
SameAndDiff
## # A tibble: 1 x 3
## between within diff
## <dbl> <dbl> <dbl>
## 1 0.497 0.411 0.0860
#SameAndDiff[1,2] - SameAndDiff[2,2] # Difference in weighted unifrac dissimilarity between same and different partcipants
Bootstrap some confidence intervals of within and between participant weighted unifrac distances.
# Split the data
SamePersonWufDist <- AllWufDist %>% filter(isSamePerson) #%>% dplyr::select(WufDist)
DifferentPersonWufDist <- AllWufDist %>% filter(!isSamePerson)
set.seed(334)
bootsSame <- rsample::bootstraps(SamePersonWufDist, times = 10000)
bootsDifferent <- rsample::bootstraps(DifferentPersonWufDist, times = 10000)
mean_of_bootstrap <- function(split){
locVals <- rsample::analysis(split)$WufDist
mean(locVals)
}
boot_meansSame <- bootsSame %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_meansDifferent <- bootsDifferent %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_means <- bind_cols(within = boot_meansSame$mean, between = boot_meansDifferent$mean) %>%
mutate(isDifferentBigger = between>within,
DifMinusSame = between - within)
#boot_means
1- sum(boot_means$isDifferentBigger)/length(boot_means$isDifferentBigger)
## [1] 0.012
The above is a bootstrapped P value that the two are different from eachother. Per a conversation I had with Klaus Hubert. Still need to find a justification that this approach is legit.
boot_means %>% ggplot(aes(x = DifMinusSame)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A histogram of bootstrapped differences between within participant and between participant mean values.
quantile(boot_means$DifMinusSame, c(0.025, 0.5, 0.975))
## 2.5% 50% 97.5%
## 0.01273644 0.08721308 0.15521066
95% confidence intervals of the differences between same and different person microbiota.
PermWufDist <- AllWufDist %>% modelr::permute(10000, WufDist)
mean_of_is_same <- function(df){df %>% as.data.frame %>% group_by(isSamePerson) %>% summarize(mean(WufDist)) %>% spread(isSamePerson, `mean(WufDist)`)}
test <- PermWufDist %>% pull(perm) %>% .[[1]] %>% as.data.frame
test %>% head
## SampleX SampleY WufDist pub_id_x pub_id_y isSamePerson
## 1 57 58 0.2863799 282 282 TRUE
## 2 57 60 0.5667858 282 176 FALSE
## 3 58 60 0.6974452 282 176 FALSE
## 4 57 61 0.8762536 282 176 FALSE
## 5 58 61 0.3909422 282 176 FALSE
## 6 60 61 0.8243214 176 176 TRUE
mean_of_is_same(test)
## # A tibble: 1 x 2
## `FALSE` `TRUE`
## <dbl> <dbl>
## 1 0.496 0.462
SameAndDiff
## # A tibble: 1 x 3
## between within diff
## <dbl> <dbl> <dbl>
## 1 0.497 0.411 0.0860
permutedMeans <- PermWufDist %>% mutate(meanvals = map(perm, mean_of_is_same)) %>% dplyr::select(meanvals) %>% unnest %>%
rename(between = `FALSE`, within = `TRUE`) %>%
mutate(diff = between - within, absdif = abs(diff)) %>%
mutate(isExtreme = absdif >= SameAndDiff$diff, isExtreme1Tail = diff >= SameAndDiff$diff)
#permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Fraction of permuted values less extreme than difference between same and different.
#permutedMeans %>% mutate(isExtreme = absdif >= SameAndDiff$diff)
sum(permutedMeans$isExtreme) / length(permutedMeans$isExtreme)
## [1] 0.0153
sum(permutedMeans$isExtreme1Tail) / length(permutedMeans$isExtreme1Tail)
## [1] 0.0073
Two and one tailed permuted p-values
options(repr.plot.width=6, repr.plot.height= 6)
boot_means %>% dplyr::select(within, between) %>% gather(key = "comparison", value = "WufDist") %>% ggplot(aes(x = comparison, y = WufDist)) +
geom_dotplot(data = AllWufDist %>% mutate(isSamePerson2 = if_else(isSamePerson, "within", "between"))
, aes(x = isSamePerson2, y = WufDist), binaxis = "y", stackdir = "center", binwidth = .01, colour = "gray40", fill = "white") +
geom_violin(fill = NA) +
geom_point(data = data.frame(comparison = c("within", "between"), WufDist = WufMeans$mean), aes(x = comparison, y = c(WufDist[2], WufDist[1])), shape = 22, fill = "black", size = 2)+
theme_bw() +
scale_x_discrete(limits = c("within", "between")) + labs(y = "Weighted Unifrac Distance") #
ggsave('figures/BetweenVsWithin.png')
## Saving 7 x 5 in image
Figure: Open circles represent weighed unifrac distances associated with pairs of samples taken from the same set of participants, at different time points (“within”), and samples taken from different sets of participants (“between”). Black squares indicate the observed mean of the within and between values. Violins indicate distributions of bootstrapped mean values.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
# How much variance si explained by each weighted unifrac axis
# Note, ten axes cover 95% of the variance.
# I'm not going to look beyond that for any test.
data.frame(eig = psN2.pcoa$CA$eig) %>%
rownames_to_column('axis') %>%
mutate(proportion = eig/sum(eig)) %>%
mutate(cumulative = cumsum(proportion))
## axis eig proportion cumulative
## 1 MDS1 0.825438822 0.2888040980 0.2888041
## 2 MDS2 0.495612315 0.1734045743 0.4622087
## 3 MDS3 0.364619128 0.1275727475 0.5897814
## 4 MDS4 0.268169168 0.0938268864 0.6836083
## 5 MDS5 0.261743609 0.0915787154 0.7751870
## 6 MDS6 0.162405998 0.0568225245 0.8320095
## 7 MDS7 0.146142884 0.0511323948 0.8831419
## 8 MDS8 0.097842330 0.0342330228 0.9173750
## 9 MDS9 0.061705028 0.0215893226 0.9389643
## 10 MDS10 0.050336955 0.0176118673 0.9565762
## 11 MDS11 0.041072011 0.0143702533 0.9709464
## 12 MDS12 0.022613493 0.0079119971 0.9788584
## 13 MDS13 0.017745217 0.0062086872 0.9850671
## 14 MDS14 0.016323022 0.0057110902 0.9907782
## 15 MDS15 0.014802146 0.0051789668 0.9959571
## 16 MDS16 0.009677461 0.0033859449 0.9993431
## 17 MDS17 0.001877523 0.0006569069 1.0000000
my_breaks = c(1, 75, 250, 500, 1000,2000)
psN2 %>% mutate_phyloseq_sample(
mc41 = factor(medcode_hl(IgG_gp41_Month_0)),
log120 = (IgG_Con.6.gp120.B_Month_12)) -> psN2_mod
psN2_mod%>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = mc41), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp41 Baseline', direction = -1, discrete = TRUE) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp41
psN2_mod %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = log120), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp120 Final', direction = 1, trans = "sqrt",
breaks = my_breaks, labels = my_breaks) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp120
par <- options()
options(repr.plot.width=11, repr.plot.height= 4)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(wuford_gp41, wuford_gp120, ncol = 2, labels = c("A", "B"), label_size = 24)
g
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, width = 8, height = 4)
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, width = 8, height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, base_width = 8, base_height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, base_width = 8, base_height = 4)
wufKN2 <- D2K(as.matrix(psN2.wuf))
muDoners <- unique(sample_data(psN2)$pub_id)
immune.data %>%
filter(pub_id %in% muDoners) %>%
filter(
(type == 'IgG' &
antigen %in% ants1 &
month %in% c(6.5,12)
) |
(type %in% c('IgG', 'IgA') &
antigen %in% ants2 &
month %in% c(0,6.5,12)
) |
type == 'CD4+' &
antigen == 'ANY.ENV.PTEG' &
month %in% c(6.5, 12)
)-> use.immune
head(use.immune)
## # A tibble: 6 x 13
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 9 T1 IgA gp41 333. 110. 0 182 6.5 T
## 2 12 T1 IgA gp41 1 110. 0 364 12 T
## 3 9 T1 IgA p24 658 330. 0 182 6.5 T
## 4 12 T1 IgA p24 862. 330. 0 364 12 T
## 5 9 T1 IgG Con.6.… 27128. 1 1 182 6.5 T
## 6 12 T1 IgG Con.6.… 1488 1 1 364 12 T
## # ... with 3 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>
# Do permanova and related tests to a variable of interest
# This function is pretty specific to this analysis, so I'm going to leave it
# here in the notebook file
CapVar <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
medWuf <- NA
rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
medWuf <- adonis(loc.wuf2 ~ transformation(ydata), permutations = nperm)
#medWuf$aov.tab[1,c('R2', 'Pr(>F)')]
## Capscale returns the same results as adonis (permanova), but also gives some other interesting results
medWufCap <- capscale(loc.wuf2 ~ transformation(ydata))
capanova <- anova(medWufCap, permutations = nperm)
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
left_join(
vegan::scores(medWufCap, display = 'sites') %>% as.data.frame %>% dplyr::select(CAP1) %>%
rownames_to_column, by = 'rowname') %>% .[!yna,]
# # Is giving only positive results with CAP1, not sure why
# glmAnova <- glm(medcode(ydata) ~ MDS1 + CAP1, data = samDf, family = 'binomial') %>% anova(test = "Chisq")
loc_glm <- glm(transformation(ydata) ~ MDS1, data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
#glmAnova['CAP1', 'Deviance']/out_capanova['NULL', 'Resid. Dev']
## check against mirkat
loc.Kwuf2 <- wufKN2[!yna, !yna]
mirkatP <- MiRKAT(y = transformation(ydata), Ks = loc.Kwuf2, out_type = "C", method = 'permutation', nperm = nperm)
#list(medWuf, capanova, mirkatP)
pred_pct <- predict(loc_glm, type = "response")
pred_01 <- as.numeric(predict(loc_glm, type = "response") > 0.5)
accuracy <- mean(transformation(ydata) == pred_01)
null_glm <- update(loc_glm, ~1)
# Canonical caluclation of McFadden's R2 for the GLM
McFadden = 1- (logLik(loc_glm)/ logLik(null_glm))
L.full = logLik(loc_glm)
D.full = -2 * L.full
L.base = logLik(null_glm)
D.base = -2 * L.base
n = dim(samDf)[1]
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))
# A GLM of all weighted unifrac components
data.frame(
caps.P = capanova['Model', 'Pr(>F)'],
adonisP = medWuf$aov.tab[1, 'Pr(>F)'],
mir.P = mirkatP,
caps.F = capanova['Model', 'F'],
caps.R2 = medWufCap$CCA$tot.chi/medWufCap$tot.chi,
wuf1.P = glmAnova['MDS1', 'Pr(>Chi)'],
wuf1.DR = glmAnova['MDS1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev'],
wuf1.McFadden = Nagelkerke,
accuracy,
wuf1.coef = coef(loc_glm)[2]
#cap1.P = glmAnova['CAP1', 'Pr(>Chi)'],
#cap1.R2 = glmAnova['CAP1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev']
)
}
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& month == 0 & ct == 'T') -> test.immune1
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
## user system elapsed
## 0.681 0.012 0.694
tps
## caps.P adonisP mir.P caps.F caps.R2 wuf1.P wuf1.DR
## MDS1 0.0437 0.0479 0.0466 1.860805 0.0900595 0.01437564 0.2061418
## wuf1.McFadden accuracy wuf1.coef
## MDS1 0.3312048 0.6666667 2.184936
use.immune %>%
filter(type == 'CD4+' & month == 6.5 & ct == 'T') -> test.immune.pteg
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune.pteg, nperm = jnperm, transformation = medcode, family = 'binomial')
proc.time() - ptm
## user system elapsed
## 5.673 0.024 5.697
tps
## caps.P adonisP mir.P caps.F caps.R2 wuf1.P wuf1.DR
## MDS1 0.2514 0.24798 0.25199 1.214412 0.06065471 0.2176327 0.05229439
## wuf1.McFadden accuracy wuf1.coef
## MDS1 0.0931634 0.5714286 -0.9372546
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm))) -> permKernTable
## Warning: `env_bind_fns()` is deprecated as of rlang 0.3.0.
## Please use `env_bind_active()` instead.
## This warning is displayed once per session.
## Warning: `new_overscope()` is deprecated as of rlang 0.2.0.
## Please use `new_data_mask()` instead.
## This warning is displayed once per session.
## Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
## Please use `eval_tidy()` with a data mask instead.
## This warning is displayed once per session.
## Warning: `overscope_clean()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
permKernTable
## # A tibble: 20 x 13
## # Groups: type, antigen, month [20]
## type antigen month caps.P adonisP mir.P caps.F caps.R2 wuf1.P
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.EN… 6.5 0.252 0.249 0.251 1.21 0.0607 0.218
## 2 CD4+ ANY.EN… 12 0.246 0.254 0.249 1.22 0.0642 0.228
## 3 IgA gp41 0 0.958 0.954 0.958 0.449 0.0233 0.651
## 4 IgA gp41 6.5 0.205 0.204 0.206 1.29 0.0644 0.143
## 5 IgA gp41 12 0.745 0.742 0.744 0.712 0.0384 0.855
## 6 IgA p24 0 0.895 0.893 0.893 0.552 0.0285 0.320
## 7 IgA p24 6.5 0.903 0.900 0.902 0.541 0.0279 0.650
## 8 IgA p24 12 0.385 0.383 0.385 1.04 0.0551 0.387
## 9 IgG Con.6.… 6.5 0.00427 0.00408 0.00431 2.82 0.130 0.00176
## 10 IgG Con.6.… 12 0.0365 0.0374 0.0358 1.96 0.0943 0.0100
## 11 IgG gp41 0 0.0469 0.0521 0.0461 1.86 0.0901 0.0144
## 12 IgG gp41 6.5 0.0456 0.0463 0.0446 1.87 0.0904 0.0305
## 13 IgG gp41 12 0.651 0.643 0.651 0.793 0.0404 0.779
## 14 IgG gp70_B… 6.5 0.906 0.904 0.904 0.539 0.0278 0.619
## 15 IgG gp70_B… 12 0.0350 0.0383 0.0346 1.97 0.0948 0.0143
## 16 IgG p24 0 0.214 0.215 0.218 1.28 0.0635 0.444
## 17 IgG p24 6.5 0.414 0.413 0.420 1.01 0.0538 0.397
## 18 IgG p24 12 0.287 0.280 0.286 1.17 0.0583 0.159
## 19 IgG ZM96.g… 6.5 0.0165 0.0164 0.0167 2.26 0.107 0.00897
## 20 IgG ZM96.g… 12 0.295 0.292 0.298 1.15 0.0577 0.162
## # ... with 4 more variables: wuf1.DR <dbl>, wuf1.McFadden <dbl>,
## # accuracy <dbl>, wuf1.coef <dbl>
proc.time() - ptm
## user system elapsed
## 116.813 0.280 117.103
The above function runs several extra tests. Results as follows:
type antigen visitno - things we run over
caps.P - Capscale test asks whether if we rotate things a bit and then try to use the best axis to compare to the data. Its similar to the wuf1.P value, but with some rotation
adonisP - p-value for a permanova test. Similar to mirkat p-value. One key exception is that igg_gp41_Month_0 falls on different sides of the 0.05 threshold.
mir.P is the p value for the kernel regression test, as run in the MiRKAT package. (Zhao et al., 2015)
caps.F and caps R2 are the f and r squared values for the capscale test.
wuf.P - is the p value of a glm comparing weighted unifrac component one against variables of interest. This test appears to always be statistically significantly positive when the mirkat test is positve.
wuf1.DR - one way of calculating an R2 value from a glm. We devide the deviance by the residual deviance
wuf1.McFadden - is a McFadden’s pseudo R^2. This turns out to be identical to the previous calculation.
accuracy - the fraction of the time that the glm predicts something falls above or below the median correctly. This turns out to not be super informative. Everything has around a 60% accuracy.
wuf1.coef - the coefficient of the glm model. The sign is relevant. Things with postive sign are associated with high values of weighted unifrac axis 1.
# Clean up so we just see the results of the kernel regression
concisePermKernTable <- permKernTable %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen,Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, GlmMDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass -> concisePermKernTable
write.csv(format(concisePermKernTable, digits = 3), 'tables/concisePermkernTable.csv')
# export conditionally formatted table as html
colNames1 = c(' ' = 3, 'Kernel' = 2, 'MDS' = 4)
colNames2 = c('Type', 'Antigen', 'Month', 'P', 'Q', 'P', 'Q', 'R2', 'Coef' )
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTable.html
concisePermKernTable.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.251 | 0.497 | 0.218 | 0.125 | 0.093 | -0.937 |
| 12 | 0.249 | 0.497 | 0.228 | 0.125 | 0.094 | -0.920 | ||
| IgA | gp41 | 0 | 0.958 | 0.958 | 0.651 | 0.219 | 0.013 | 0.333 |
| 6.5 | 0.206 | 0.497 | 0.143 | 0.109 | 0.129 | -1.133 | ||
| 12 | 0.744 | 0.930 | 0.855 | 0.259 | 0.002 | -0.136 | ||
| p24 | 0 | 0.893 | 0.952 | 0.320 | 0.161 | 0.061 | 0.752 | |
| 6.5 | 0.902 | 0.952 | 0.650 | 0.219 | 0.013 | -0.333 | ||
| 12 | 0.385 | 0.593 | 0.387 | 0.172 | 0.049 | -0.657 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.004 | 0.086 | 0.002 | 0.011 | 0.497 | -3.104 |
| 12 | 0.036 | 0.154 | 0.010 | 0.017 | 0.361 | -2.289 | ||
| gp41 | 0 | 0.046 | 0.154 | 0.014 | 0.017 | 0.331 | 2.185 | |
| 6.5 | 0.045 | 0.154 | 0.030 | 0.031 | 0.267 | -1.809 | ||
| 12 | 0.651 | 0.868 | 0.779 | 0.248 | 0.005 | -0.205 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.904 | 0.952 | 0.619 | 0.219 | 0.016 | -0.365 | |
| 12 | 0.035 | 0.154 | 0.014 | 0.017 | 0.332 | -2.135 | ||
| p24 | 0 | 0.218 | 0.497 | 0.444 | 0.179 | 0.037 | -0.567 | |
| 6.5 | 0.420 | 0.600 | 0.397 | 0.172 | 0.047 | -0.749 | ||
| 12 | 0.286 | 0.497 | 0.159 | 0.109 | 0.120 | -1.085 | ||
| ZM96.gp140 | 6.5 | 0.017 | 0.154 | 0.009 | 0.017 | 0.370 | -2.338 | |
| 12 | 0.298 | 0.497 | 0.162 | 0.109 | 0.118 | -1.076 |
concisePermKernTable.html %>% cat(file = 'tables/concisePermkernTable.html')
Latex version of the same table
docHead <- "\\documentclass[12pt]{article} % use larger type; default would be 10pt
\\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\\usepackage{booktabs}
\\usepackage{longtable}
\\usepackage{array}
\\usepackage{multirow}
\\usepackage[table]{xcolor}
\\usepackage{wrapfig}
\\usepackage{float}
\\usepackage{colortbl}
\\usepackage{pdflscape}
\\usepackage{tabu}
\\usepackage{threeparttable}
\\usepackage{threeparttablex}
\\usepackage[normalem]{ulem}
\\usepackage{makecell}
\\definecolor{green}{rgb}{1, 1, .9}
\\begin{document}
"
docTail <- "\\end{document}
"
# Make latex table
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "latex",
bold = ifelse(MDS1_P < 0.05,
T,
F),
italic = ifelse(MDS1_P < 0.05 & MDS1_Coef < 0,
T,
F)),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTable.tex
# Print latex table to tex file
cat(docHead, concisePermKernTable.tex, docTail, file = 'tables/concisePermkernTable1.tex')
concisePermKernTable %>% filter(Kernel_P < 0.05) -> shortPermkernTable
shortPermkernTable
## Type Antigen Month Kernel_P Kernel_Q MDS1_P MDS1_Q
## 1 IgG Con.6.gp120.B 6.5 0.00431 0.0862 0.001759476 0.01064944
## 2 IgG Con.6.gp120.B 12.0 0.03584 0.1538 0.010022385 0.01740207
## 3 IgG gp41 0.0 0.04614 0.1538 0.014375638 0.01740207
## 4 IgG gp41 6.5 0.04457 0.1538 0.030461387 0.03072856
## 5 IgG gp70_B.CaseA_V1_V2 12.0 0.03465 0.1538 0.014277612 0.01740207
## 6 IgG ZM96.gp140 6.5 0.01671 0.1538 0.008971440 0.01740207
## GlmMDS1_R2 MDS1_Coef
## 1 0.4969907 -3.103554
## 2 0.3612921 -2.289438
## 3 0.3312048 2.184936
## 4 0.2667186 -1.808614
## 5 0.3317812 -2.134684
## 6 0.3704046 -2.338385
write.csv(format(shortPermkernTable, digits = 3), 'tables/shortPermkernTable.csv')
Of kernel regression P values
my_runif <- function(Len){
loc_runif <- runif(n = Len)
sort_loc_runif <- sort(loc_runif)
data.frame(case = 1:Len, relement = sort_loc_runif)
}
calc_bound <- function(df, bound){
quantile(df$relement, bound)
}
make_qqdata <- function(pvec, nboot = 10000){
locP <- pvec
LP <- length(locP)
#1:10 %>% map(runif, n = LP)
sortedP <- sort(locP)
exp2 <- 1:length(locP)/length(locP)
sortedExpP <- sort(exp2)
random_pvalues <- data.frame(iter = 1:nboot) %>%
mutate(rand = map(iter, ~my_runif(Len = LP))) %>% unnest %>%
nest(-case) %>%
mutate(lb = map_dbl(data, ~calc_bound(., 0.025)),
ub = map_dbl(data, ~calc_bound(.,0.975))) %>% dplyr::select (-data)
qqdata <- bind_cols(sortedP = sortedP, sortedExpP = sortedExpP, random_pvalues)
return(qqdata)
}
qqdata_permKernMir <- make_qqdata(permKernTable$mir.P)
qqdata_permKernMir %>% str
## Classes 'tbl_df', 'tbl' and 'data.frame': 20 obs. of 5 variables:
## $ sortedP : num 0.00431 0.01671 0.03465 0.03584 0.04457 ...
## $ sortedExpP: num 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 ...
## $ case : int 1 2 3 4 5 6 7 8 9 10 ...
## $ lb : num 0.00123 0.01314 0.03293 0.05827 0.08676 ...
## $ ub : num 0.169 0.247 0.317 0.38 0.441 ...
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMir.png')
## Saving 7 x 5 in image
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + geom_line(aes(y = lb), colour = "grey40") + geom_line(aes(y = ub), colour = "grey40") + scale_y_log10() + scale_x_log10() +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
We see from the qqplot that we generally fall below the 1:1 line, suggesting that our P values are lower than we would expect from chance. To make the grey lines, I sampled 95% confidence intervals of random draws from the uniform distribution (essentially random p values). The data points sometimes, but not always fall below the lower one of these. However, we care about all of the p values, not wheter eaach individual is more unusual than we would expect by chance. I’ll replort just the regular qqplot as a supplemental figure.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = function(x){jac_box_cox_2(x)}, family = 'gaussian')
proc.time() - ptm
## user system elapsed
## 0.648 0.000 0.648
tps
## caps.P adonisP mir.P caps.F caps.R2 wuf1.P wuf1.DR
## MDS1 0.2222 0.2538 0.2194 1.311871 0.06520801 0.03795074 0.1848023
## wuf1.McFadden accuracy wuf1.coef
## MDS1 0.1969076 0 0.700859
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm,
transformation = function(x){jac_box_cox_2(x)},
family = 'gaussian'))) -> permKernTableGaus
permKernTableGaus
## # A tibble: 20 x 13
## # Groups: type, antigen, month [20]
## type antigen month caps.P adonisP mir.P caps.F caps.R2 wuf1.P
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.EN… 6.5 0.603 6.02e-1 0.602 0.827 0.0421 5.97e-1
## 2 CD4+ ANY.EN… 12 0.185 1.94e-1 0.185 1.36 0.0707 3.29e-1
## 3 IgA gp41 0 0.990 9.88e-1 0.989 0.311 0.0163 6.33e-1
## 4 IgA gp41 6.5 0.411 4.07e-1 0.411 1.00 0.0505 6.01e-1
## 5 IgA gp41 12 0.655 6.52e-1 0.651 0.755 0.0406 5.04e-1
## 6 IgA p24 0 0.953 9.53e-1 0.954 0.417 0.0217 4.33e-1
## 7 IgA p24 6.5 0.976 9.73e-1 0.976 0.373 0.0194 8.57e-1
## 8 IgA p24 12 0.574 5.75e-1 0.575 0.852 0.0456 3.09e-1
## 9 IgG Con.6.… 6.5 0.0736 7.60e-2 0.0731 1.72 0.0839 3.16e-2
## 10 IgG Con.6.… 12 0.0007 6.00e-4 0.00068 3.55 0.159 1.43e-5
## 11 IgG gp41 0 0.221 2.48e-1 0.221 1.31 0.0652 3.80e-2
## 12 IgG gp41 6.5 0.220 2.25e-1 0.219 1.29 0.0643 7.98e-2
## 13 IgG gp41 12 0.266 2.71e-1 0.263 1.23 0.0613 1.73e-1
## 14 IgG gp70_B… 6.5 0.287 2.93e-1 0.282 1.19 0.0595 2.06e-1
## 15 IgG gp70_B… 12 0.110 1.12e-1 0.111 1.55 0.0763 8.34e-2
## 16 IgG p24 0 0.754 7.45e-1 0.753 0.636 0.0327 9.07e-1
## 17 IgG p24 6.5 0.0268 2.61e-2 0.0272 2.15 0.108 5.30e-2
## 18 IgG p24 12 0.411 4.14e-1 0.413 1.02 0.0513 1.33e-1
## 19 IgG ZM96.g… 6.5 0.0293 2.96e-2 0.0290 2.24 0.106 1.69e-2
## 20 IgG ZM96.g… 12 0.204 2.00e-1 0.198 1.32 0.0657 7.77e-2
## # ... with 4 more variables: wuf1.DR <dbl>, wuf1.McFadden <dbl>,
## # accuracy <dbl>, wuf1.coef <dbl>
proc.time() - ptm
## user system elapsed
## 117.617 0.532 118.160
# Clean up so we just see the results of the kernel regression
concisePermKernTableGaus <- permKernTableGaus %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, MDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass
concisePermKernTableGaus
## Type Antigen Month Kernel_P Kernel_Q MDS1_P
## 1 CD4+ ANY.ENV.PTEG 6.5 0.60213 0.8028400 5.971764e-01
## 2 CD4+ ANY.ENV.PTEG 12.0 0.18544 0.4920667 3.291117e-01
## 3 IgA gp41 0.0 0.98896 0.9889600 6.330939e-01
## 4 IgA gp41 6.5 0.41053 0.6358769 6.006684e-01
## 5 IgA gp41 12.0 0.65053 0.8131625 5.043440e-01
## 6 IgA p24 0.0 0.95380 0.9889600 4.331569e-01
## 7 IgA p24 6.5 0.97610 0.9889600 8.573830e-01
## 8 IgA p24 12.0 0.57478 0.8028400 3.091250e-01
## 9 IgG Con.6.gp120.B 6.5 0.07311 0.3655500 3.164215e-02
## 10 IgG Con.6.gp120.B 12.0 0.00068 0.0136000 1.431333e-05
## 11 IgG gp41 0.0 0.22143 0.4920667 3.795074e-02
## 12 IgG gp41 6.5 0.21873 0.4920667 7.975175e-02
## 13 IgG gp41 12.0 0.26339 0.5136000 1.731298e-01
## 14 IgG gp70_B.CaseA_V1_V2 6.5 0.28248 0.5136000 2.062894e-01
## 15 IgG gp70_B.CaseA_V1_V2 12.0 0.11097 0.4438800 8.341959e-02
## 16 IgG p24 0.0 0.75279 0.8856353 9.069707e-01
## 17 IgG p24 6.5 0.02725 0.1932667 5.296858e-02
## 18 IgG p24 12.0 0.41332 0.6358769 1.334352e-01
## 19 IgG ZM96.gp140 6.5 0.02899 0.1932667 1.691120e-02
## 20 IgG ZM96.gp140 12.0 0.19822 0.4920667 7.767448e-02
## MDS1_Q MDS1_R2 MDS1_Coef
## 1 0.3441069304 0.0154346775 -0.19622223
## 2 0.2476839187 0.0535522002 -0.35622365
## 3 0.3441069304 0.0126283362 0.17748943
## 4 0.3441069304 0.0151466567 -0.19438279
## 5 0.3289526950 0.0257545280 0.25159262
## 6 0.3027018097 0.0333728299 0.28853330
## 7 0.4414881726 0.0018079718 -0.06715761
## 8 0.2476839187 0.0579135356 -0.37727770
## 9 0.0928235317 0.2083288787 -0.72089857
## 10 0.0001400356 0.5303144044 -1.15018086
## 11 0.0928235317 0.1969076139 0.70085904
## 12 0.1020177828 0.1482127646 -0.60805411
## 13 0.1693829682 0.0948033344 -0.48630779
## 14 0.1834771570 0.0826277473 -0.45400682
## 15 0.1020177828 0.1451699474 -0.60178005
## 16 0.4436710261 0.0007652868 0.04369297
## 17 0.1020177828 0.1835313650 -0.78663031
## 18 0.1450526857 0.1129112051 -0.53072304
## 19 0.0827260298 0.2460702885 -0.78348198
## 20 0.1020177828 0.1499943359 -0.61169771
write.csv(format(concisePermKernTableGaus, digits = 3), 'tables/concisePermkernTableGaus.csv')
# export conditionally formatted table as html
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format_round(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTableGaus.html
concisePermKernTableGaus.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.602 | 0.803 | 0.597 | 0.344 | 0.015 | -0.196 |
| 12 | 0.185 | 0.492 | 0.329 | 0.248 | 0.054 | -0.356 | ||
| IgA | gp41 | 0 | 0.989 | 0.989 | 0.633 | 0.344 | 0.013 | 0.177 |
| 6.5 | 0.411 | 0.636 | 0.601 | 0.344 | 0.015 | -0.194 | ||
| 12 | 0.651 | 0.813 | 0.504 | 0.329 | 0.026 | 0.252 | ||
| p24 | 0 | 0.954 | 0.989 | 0.433 | 0.303 | 0.033 | 0.289 | |
| 6.5 | 0.976 | 0.989 | 0.857 | 0.441 | 0.002 | -0.067 | ||
| 12 | 0.575 | 0.803 | 0.309 | 0.248 | 0.058 | -0.377 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.073 | 0.366 | 0.032 | 0.093 | 0.208 | -0.721 |
| 12 | 0.001 | 0.014 | 0.000 | 0.000 | 0.530 | -1.150 | ||
| gp41 | 0 | 0.221 | 0.492 | 0.038 | 0.093 | 0.197 | 0.701 | |
| 6.5 | 0.219 | 0.492 | 0.080 | 0.102 | 0.148 | -0.608 | ||
| 12 | 0.263 | 0.514 | 0.173 | 0.169 | 0.095 | -0.486 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.282 | 0.514 | 0.206 | 0.183 | 0.083 | -0.454 | |
| 12 | 0.111 | 0.444 | 0.083 | 0.102 | 0.145 | -0.602 | ||
| p24 | 0 | 0.753 | 0.886 | 0.907 | 0.444 | 0.001 | 0.044 | |
| 6.5 | 0.027 | 0.193 | 0.053 | 0.102 | 0.184 | -0.787 | ||
| 12 | 0.413 | 0.636 | 0.133 | 0.145 | 0.113 | -0.531 | ||
| ZM96.gp140 | 6.5 | 0.029 | 0.193 | 0.017 | 0.083 | 0.246 | -0.783 | |
| 12 | 0.198 | 0.492 | 0.078 | 0.102 | 0.150 | -0.612 |
concisePermKernTableGaus.html %>% cat(file = 'tables/concisePermkernTableGaus.html')
concisePermKernTableGaus %>% filter(Kernel_P < 0.05) -> shortPermkernTableGaus
shortPermkernTableGaus
## Type Antigen Month Kernel_P Kernel_Q MDS1_P MDS1_Q
## 1 IgG Con.6.gp120.B 12.0 0.00068 0.0136000 1.431333e-05 0.0001400356
## 2 IgG p24 6.5 0.02725 0.1932667 5.296858e-02 0.1020177828
## 3 IgG ZM96.gp140 6.5 0.02899 0.1932667 1.691120e-02 0.0827260298
## MDS1_R2 MDS1_Coef
## 1 0.5303144 -1.1501809
## 2 0.1835314 -0.7866303
## 3 0.2460703 -0.7834820
write.csv(format(shortPermkernTableGaus, digits = 3), 'tables/shortPermkernTable.csv')
# Make latex table
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTableGaus.tex
# Print latex table to tex file
cat(docHead, concisePermKernTableGaus.tex, docTail, file = 'tables/concisePermkernTableGaus.tex')
### QQplot for kernel regression data
qqdata_permKernMirGaus <- make_qqdata(permKernTableGaus$mir.P)
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMirGaus %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMirGaus.png')
## Saving 7 x 5 in image
use.immune %>% dplyr::select(pub_id, visitno, type, antigen, mag) -> tmp
full_join(tmp, tmp, by = 'pub_id') %>%
group_by(visitno.x, type.x, antigen.x, visitno.y, type.y, antigen.y) %>%
nest %>%
mutate(x2 = map(data, function(df){unwarn(chisq.test(df$mag.x, df$mag.y))})) %>%
mutate(glance = map(x2, glance)) %>%
dplyr::select(-data, -x2) %>%
unnest(glance) %>%
#mutate(q.value = p2q(p.value)) %>% # reurns NaNs
pass -> compareImmuneX2
compareImmuneX2 %>% filter(
type.x == 'IgG' &
antigen.x == 'gp41' &
type.y == 'IgG' &
antigen.y == 'gp41'
)
## # A tibble: 9 x 10
## visitno.x type.x antigen.x visitno.y type.y antigen.y statistic p.value
## <int> <chr> <chr> <int> <chr> <chr> <dbl> <dbl>
## 1 9 IgG gp41 9 IgG gp41 420 0.236
## 2 9 IgG gp41 12 IgG gp41 378 0.247
## 3 9 IgG gp41 2 IgG gp41 420 0.236
## 4 12 IgG gp41 9 IgG gp41 378 0.247
## 5 12 IgG gp41 12 IgG gp41 378 0.0207
## 6 12 IgG gp41 2 IgG gp41 378 0.247
## 7 2 IgG gp41 9 IgG gp41 420 0.236
## 8 2 IgG gp41 12 IgG gp41 378 0.247
## 9 2 IgG gp41 2 IgG gp41 420 0.236
## # ... with 2 more variables: parameter <int>, method <chr>
compareImmuneX2 %>%
filter(type.x == 'IgG' & type.y == 'IgG' & antigen.x != antigen.y) %>%
write_csv('tables/chisq_IgG_comparasons.csv')
psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname') %>%
sample_data %>% as('data.frame') %>% rownames_to_column -> hereSam
data_frame(formula = paste("transformation(ydata) ~ MDS", 1:10, sep = ""))
## # A tibble: 10 x 1
## formula
## <chr>
## 1 transformation(ydata) ~ MDS1
## 2 transformation(ydata) ~ MDS2
## 3 transformation(ydata) ~ MDS3
## 4 transformation(ydata) ~ MDS4
## 5 transformation(ydata) ~ MDS5
## 6 transformation(ydata) ~ MDS6
## 7 transformation(ydata) ~ MDS7
## 8 transformation(ydata) ~ MDS8
## 9 transformation(ydata) ~ MDS9
## 10 transformation(ydata) ~ MDS10
EachMDS <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
# medWuf <- NA
# rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
.[!yna,]
# # Is giving only positive results with CAP1, not sure why
loc_glm <- glm(as.formula("transformation(ydata) ~ MDS1"), data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
# data_frame, rather than data.frame
# https://stackoverflow.com/questions/48450308/iterating-over-formulas-in-purrr#48450308
data_frame(formulaString = paste("transformation(ydata) ~ MDS", 1:10, sep = "")) %>%
mutate(model = map(formulaString, function(fs){
glm(as.formula(fs), data = samDf, family = family)})) %>%
mutate(anova = map(model, anova)) %>%
mutate(glance = map(model, glance)) %>%
mutate(tidy = map(model, tidy)) %>%
mutate(coef = map(model, ~ coef(summary(.))[2,])) %>%
pass -> allmodels
allmodels %>% dplyr::select("tidy") %>% unnest %>% filter(term != '(Intercept)')
}
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- EachMDS(test.immune.pteg, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
## user system elapsed
## 0.145 0.000 0.145
tps
## # A tibble: 10 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 MDS1 -0.937 0.793 -1.18 0.237
## 2 MDS2 0.447 0.758 0.590 0.555
## 3 MDS3 -0.214 0.732 -0.292 0.771
## 4 MDS4 -1.46 0.908 -1.61 0.108
## 5 MDS5 -0.997 0.844 -1.18 0.237
## 6 MDS6 -1.47 0.907 -1.62 0.104
## 7 MDS7 0.267 0.736 0.363 0.717
## 8 MDS8 -1.74 1.01 -1.73 0.0838
## 9 MDS9 0.360 0.741 0.487 0.627
## 10 MDS10 0.784 0.787 0.996 0.319
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(coefs = map(data, ~ EachMDS(.))) %>%
dplyr::select(-data) %>% unnest(coefs) -> glmMDScoefs
ants1
## [1] "Con.6.gp120.B" "ZM96.gp140" "gp70_B.CaseA_V1_V2"
glmMDScoefs %>%
gather(key = "key", value = "value", estimate:p.value) %>%
filter(key == "p.value") %>%
spread(key = term, value = value) %>%
dplyr::select(-key, -MDS10, MDS10) %>%
dplyr::rename(Type = type, Antigen = antigen, Month = month) %>%
mutate(Type = factor(Type, levels = c( "IgA", "IgG", "CD4+"))) %>%
mutate(Antigen = factor(Antigen, levels = c(ants1, ants2, "ANY.ENV.PTEG"))) %>%
#Clean up labels
mutate(Antigen = stringr::str_replace_all(Antigen, "_", " ")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "V1 V2", "V1-V2")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "ANY.ENV.PTEG", "Any ENV PTEG")) %>%
arrange(Type) -> allMDS
allMDS
## # A tibble: 20 x 13
## Type Antigen Month MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 IgA gp41 0 0.653 0.607 0.401 0.702 0.786 0.650 0.242 0.349
## 2 IgA gp41 6.5 0.168 0.368 0.701 0.954 0.112 0.142 0.456 0.475
## 3 IgA gp41 12 0.855 0.201 0.462 0.408 0.994 0.798 0.469 0.106
## 4 IgA p24 0 0.335 0.629 0.520 0.939 0.666 0.657 0.148 0.853
## 5 IgA p24 6.5 0.652 0.281 0.681 0.414 0.438 0.599 0.999 0.199
## 6 IgA p24 12 0.398 0.724 0.227 0.202 0.128 0.465 0.445 0.156
## 7 IgG Con.6.… 6.5 0.0175 0.536 0.420 0.890 0.325 0.427 0.853 0.806
## 8 IgG Con.6.… 12 0.0315 0.878 0.377 0.635 0.989 0.540 0.569 0.374
## 9 IgG gp41 0 0.0399 0.894 0.304 0.967 0.442 0.231 0.504 0.595
## 10 IgG gp41 6.5 0.0567 0.274 0.226 0.891 0.858 0.165 0.601 0.426
## 11 IgG gp41 12 0.779 0.310 0.443 0.228 0.680 0.268 0.995 0.0391
## 12 IgG gp70 B… 6.5 0.621 0.853 0.300 0.458 0.507 0.690 0.506 0.161
## 13 IgG gp70 B… 12 0.0373 0.665 0.808 0.447 0.318 0.296 0.743 0.143
## 14 IgG p24 0 0.451 0.231 0.986 0.0876 0.149 0.0510 0.882 0.507
## 15 IgG p24 6.5 0.404 0.828 0.243 0.250 0.689 0.0819 0.180 0.846
## 16 IgG p24 12 0.182 0.721 0.237 0.676 0.933 0.371 0.0418 0.423
## 17 IgG ZM96.g… 6.5 0.0299 0.631 0.244 0.750 0.234 0.949 0.890 0.994
## 18 IgG ZM96.g… 12 0.186 0.353 0.286 0.722 0.889 0.409 0.459 0.376
## 19 CD4+ Any EN… 6.5 0.237 0.555 0.771 0.108 0.237 0.104 0.717 0.0838
## 20 CD4+ Any EN… 12 0.248 0.565 0.434 0.103 0.117 0.856 0.861 0.128
## # ... with 2 more variables: MDS9 <dbl>, MDS10 <dbl>
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
as.character() -> allMDS.html
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full")
| Type | Antigen | Month | MDS1 | MDS2 | MDS3 | MDS4 | MDS5 | MDS6 | MDS7 | MDS8 | MDS9 | MDS10 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IgA | gp41 | 0.0 | 0.653 | 0.607 | 0.401 | 0.702 | 0.786 | 0.650 | 0.242 | 0.349 | 0.038 | 0.700 |
| 6.5 | 0.168 | 0.368 | 0.701 | 0.954 | 0.112 | 0.142 | 0.456 | 0.475 | 0.789 | 0.864 | ||
| 12.0 | 0.855 | 0.201 | 0.462 | 0.408 | 0.994 | 0.798 | 0.469 | 0.106 | 0.079 | 0.232 | ||
| p24 | 0.0 | 0.335 | 0.629 | 0.520 | 0.939 | 0.666 | 0.657 | 0.148 | 0.853 | 0.921 | 0.517 | |
| 6.5 | 0.652 | 0.281 | 0.681 | 0.414 | 0.438 | 0.599 | 0.999 | 0.199 | 0.881 | 0.351 | ||
| 12.0 | 0.398 | 0.724 | 0.227 | 0.202 | 0.128 | 0.465 | 0.445 | 0.156 | 0.516 | 0.159 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.017 | 0.536 | 0.420 | 0.890 | 0.325 | 0.427 | 0.853 | 0.806 | 0.634 | 0.250 |
| 12.0 | 0.031 | 0.878 | 0.377 | 0.635 | 0.989 | 0.540 | 0.569 | 0.374 | 0.686 | 0.299 | ||
| gp41 | 0.0 | 0.040 | 0.894 | 0.304 | 0.967 | 0.442 | 0.231 | 0.504 | 0.595 | 0.856 | 0.509 | |
| 6.5 | 0.057 | 0.274 | 0.226 | 0.891 | 0.858 | 0.165 | 0.601 | 0.426 | 0.212 | 0.606 | ||
| 12.0 | 0.779 | 0.310 | 0.443 | 0.228 | 0.680 | 0.268 | 0.995 | 0.039 | 0.357 | 0.054 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.621 | 0.853 | 0.300 | 0.458 | 0.507 | 0.690 | 0.506 | 0.161 | 0.039 | 0.880 | |
| 12.0 | 0.037 | 0.665 | 0.808 | 0.447 | 0.318 | 0.296 | 0.743 | 0.143 | 0.403 | 0.276 | ||
| p24 | 0.0 | 0.451 | 0.231 | 0.986 | 0.088 | 0.149 | 0.051 | 0.882 | 0.507 | 0.384 | 0.439 | |
| 6.5 | 0.404 | 0.828 | 0.243 | 0.250 | 0.689 | 0.082 | 0.180 | 0.846 | 0.360 | 0.101 | ||
| 12.0 | 0.182 | 0.721 | 0.237 | 0.676 | 0.933 | 0.371 | 0.042 | 0.423 | 0.716 | 0.147 | ||
| ZM96.gp140 | 6.5 | 0.030 | 0.631 | 0.244 | 0.750 | 0.234 | 0.949 | 0.890 | 0.994 | 0.090 | 0.504 | |
| 12.0 | 0.186 | 0.353 | 0.286 | 0.722 | 0.889 | 0.409 | 0.459 | 0.376 | 0.021 | 0.189 | ||
| CD4+ | Any ENV PTEG | 6.5 | 0.237 | 0.555 | 0.771 | 0.108 | 0.237 | 0.104 | 0.717 | 0.084 | 0.627 | 0.319 |
| 12.0 | 0.248 | 0.565 | 0.434 | 0.103 | 0.117 | 0.856 | 0.861 | 0.128 | 0.773 | 0.723 |
allMDS.html
## [1] "<table>\n <thead>\n <tr>\n <th style=\"text-align:center;\"> Type </th>\n <th style=\"text-align:center;\"> Antigen </th>\n <th style=\"text-align:center;\"> Month </th>\n <th style=\"text-align:center;\"> MDS1 </th>\n <th style=\"text-align:center;\"> MDS2 </th>\n <th style=\"text-align:center;\"> MDS3 </th>\n <th style=\"text-align:center;\"> MDS4 </th>\n <th style=\"text-align:center;\"> MDS5 </th>\n <th style=\"text-align:center;\"> MDS6 </th>\n <th style=\"text-align:center;\"> MDS7 </th>\n <th style=\"text-align:center;\"> MDS8 </th>\n <th style=\"text-align:center;\"> MDS9 </th>\n <th style=\"text-align:center;\"> MDS10 </th>\n </tr>\n </thead>\n<tbody>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"6\"> IgA </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.653 </td>\n <td style=\"text-align:center;\"> 0.607 </td>\n <td style=\"text-align:center;\"> 0.401 </td>\n <td style=\"text-align:center;\"> 0.702 </td>\n <td style=\"text-align:center;\"> 0.786 </td>\n <td style=\"text-align:center;\"> 0.650 </td>\n <td style=\"text-align:center;\"> 0.242 </td>\n <td style=\"text-align:center;\"> 0.349 </td>\n <td style=\"text-align:center;\"> 0.038 </td>\n <td style=\"text-align:center;\"> 0.700 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.168 </td>\n <td style=\"text-align:center;\"> 0.368 </td>\n <td style=\"text-align:center;\"> 0.701 </td>\n <td style=\"text-align:center;\"> 0.954 </td>\n <td style=\"text-align:center;\"> 0.112 </td>\n <td style=\"text-align:center;\"> 0.142 </td>\n <td style=\"text-align:center;\"> 0.456 </td>\n <td style=\"text-align:center;\"> 0.475 </td>\n <td style=\"text-align:center;\"> 0.789 </td>\n <td style=\"text-align:center;\"> 0.864 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.855 </td>\n <td style=\"text-align:center;\"> 0.201 </td>\n <td style=\"text-align:center;\"> 0.462 </td>\n <td style=\"text-align:center;\"> 0.408 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.798 </td>\n <td style=\"text-align:center;\"> 0.469 </td>\n <td style=\"text-align:center;\"> 0.106 </td>\n <td style=\"text-align:center;\"> 0.079 </td>\n <td style=\"text-align:center;\"> 0.232 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.335 </td>\n <td style=\"text-align:center;\"> 0.629 </td>\n <td style=\"text-align:center;\"> 0.520 </td>\n <td style=\"text-align:center;\"> 0.939 </td>\n <td style=\"text-align:center;\"> 0.666 </td>\n <td style=\"text-align:center;\"> 0.657 </td>\n <td style=\"text-align:center;\"> 0.148 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.921 </td>\n <td style=\"text-align:center;\"> 0.517 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.652 </td>\n <td style=\"text-align:center;\"> 0.281 </td>\n <td style=\"text-align:center;\"> 0.681 </td>\n <td style=\"text-align:center;\"> 0.414 </td>\n <td style=\"text-align:center;\"> 0.438 </td>\n <td style=\"text-align:center;\"> 0.599 </td>\n <td style=\"text-align:center;\"> 0.999 </td>\n <td style=\"text-align:center;\"> 0.199 </td>\n <td style=\"text-align:center;\"> 0.881 </td>\n <td style=\"text-align:center;\"> 0.351 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.398 </td>\n <td style=\"text-align:center;\"> 0.724 </td>\n <td style=\"text-align:center;\"> 0.227 </td>\n <td style=\"text-align:center;\"> 0.202 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.465 </td>\n <td style=\"text-align:center;\"> 0.445 </td>\n <td style=\"text-align:center;\"> 0.156 </td>\n <td style=\"text-align:center;\"> 0.516 </td>\n <td style=\"text-align:center;\"> 0.159 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"12\"> IgG </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Con.6.gp120.B </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.017 </td>\n <td style=\"text-align:center;\"> 0.536 </td>\n <td style=\"text-align:center;\"> 0.420 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.325 </td>\n <td style=\"text-align:center;\"> 0.427 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.806 </td>\n <td style=\"text-align:center;\"> 0.634 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.031 </td>\n <td style=\"text-align:center;\"> 0.878 </td>\n <td style=\"text-align:center;\"> 0.377 </td>\n <td style=\"text-align:center;\"> 0.635 </td>\n <td style=\"text-align:center;\"> 0.989 </td>\n <td style=\"text-align:center;\"> 0.540 </td>\n <td style=\"text-align:center;\"> 0.569 </td>\n <td style=\"text-align:center;\"> 0.374 </td>\n <td style=\"text-align:center;\"> 0.686 </td>\n <td style=\"text-align:center;\"> 0.299 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.040 </td>\n <td style=\"text-align:center;\"> 0.894 </td>\n <td style=\"text-align:center;\"> 0.304 </td>\n <td style=\"text-align:center;\"> 0.967 </td>\n <td style=\"text-align:center;\"> 0.442 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n <td style=\"text-align:center;\"> 0.595 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.509 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.057 </td>\n <td style=\"text-align:center;\"> 0.274 </td>\n <td style=\"text-align:center;\"> 0.226 </td>\n <td style=\"text-align:center;\"> 0.891 </td>\n <td style=\"text-align:center;\"> 0.858 </td>\n <td style=\"text-align:center;\"> 0.165 </td>\n <td style=\"text-align:center;\"> 0.601 </td>\n <td style=\"text-align:center;\"> 0.426 </td>\n <td style=\"text-align:center;\"> 0.212 </td>\n <td style=\"text-align:center;\"> 0.606 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.779 </td>\n <td style=\"text-align:center;\"> 0.310 </td>\n <td style=\"text-align:center;\"> 0.443 </td>\n <td style=\"text-align:center;\"> 0.228 </td>\n <td style=\"text-align:center;\"> 0.680 </td>\n <td style=\"text-align:center;\"> 0.268 </td>\n <td style=\"text-align:center;\"> 0.995 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.357 </td>\n <td style=\"text-align:center;\"> 0.054 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> gp70 B.CaseA V1-V2 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.621 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.300 </td>\n <td style=\"text-align:center;\"> 0.458 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.690 </td>\n <td style=\"text-align:center;\"> 0.506 </td>\n <td style=\"text-align:center;\"> 0.161 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.880 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.037 </td>\n <td style=\"text-align:center;\"> 0.665 </td>\n <td style=\"text-align:center;\"> 0.808 </td>\n <td style=\"text-align:center;\"> 0.447 </td>\n <td style=\"text-align:center;\"> 0.318 </td>\n <td style=\"text-align:center;\"> 0.296 </td>\n <td style=\"text-align:center;\"> 0.743 </td>\n <td style=\"text-align:center;\"> 0.143 </td>\n <td style=\"text-align:center;\"> 0.403 </td>\n <td style=\"text-align:center;\"> 0.276 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.451 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.986 </td>\n <td style=\"text-align:center;\"> 0.088 </td>\n <td style=\"text-align:center;\"> 0.149 </td>\n <td style=\"text-align:center;\"> 0.051 </td>\n <td style=\"text-align:center;\"> 0.882 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.384 </td>\n <td style=\"text-align:center;\"> 0.439 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.404 </td>\n <td style=\"text-align:center;\"> 0.828 </td>\n <td style=\"text-align:center;\"> 0.243 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n <td style=\"text-align:center;\"> 0.689 </td>\n <td style=\"text-align:center;\"> 0.082 </td>\n <td style=\"text-align:center;\"> 0.180 </td>\n <td style=\"text-align:center;\"> 0.846 </td>\n <td style=\"text-align:center;\"> 0.360 </td>\n <td style=\"text-align:center;\"> 0.101 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.182 </td>\n <td style=\"text-align:center;\"> 0.721 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.676 </td>\n <td style=\"text-align:center;\"> 0.933 </td>\n <td style=\"text-align:center;\"> 0.371 </td>\n <td style=\"text-align:center;\"> 0.042 </td>\n <td style=\"text-align:center;\"> 0.423 </td>\n <td style=\"text-align:center;\"> 0.716 </td>\n <td style=\"text-align:center;\"> 0.147 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> ZM96.gp140 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.030 </td>\n <td style=\"text-align:center;\"> 0.631 </td>\n <td style=\"text-align:center;\"> 0.244 </td>\n <td style=\"text-align:center;\"> 0.750 </td>\n <td style=\"text-align:center;\"> 0.234 </td>\n <td style=\"text-align:center;\"> 0.949 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.090 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.186 </td>\n <td style=\"text-align:center;\"> 0.353 </td>\n <td style=\"text-align:center;\"> 0.286 </td>\n <td style=\"text-align:center;\"> 0.722 </td>\n <td style=\"text-align:center;\"> 0.889 </td>\n <td style=\"text-align:center;\"> 0.409 </td>\n <td style=\"text-align:center;\"> 0.459 </td>\n <td style=\"text-align:center;\"> 0.376 </td>\n <td style=\"text-align:center;\"> 0.021 </td>\n <td style=\"text-align:center;\"> 0.189 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> CD4+ </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Any ENV PTEG </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.555 </td>\n <td style=\"text-align:center;\"> 0.771 </td>\n <td style=\"text-align:center;\"> 0.108 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.104 </td>\n <td style=\"text-align:center;\"> 0.717 </td>\n <td style=\"text-align:center;\"> 0.084 </td>\n <td style=\"text-align:center;\"> 0.627 </td>\n <td style=\"text-align:center;\"> 0.319 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.248 </td>\n <td style=\"text-align:center;\"> 0.565 </td>\n <td style=\"text-align:center;\"> 0.434 </td>\n <td style=\"text-align:center;\"> 0.103 </td>\n <td style=\"text-align:center;\"> 0.117 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.861 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.773 </td>\n <td style=\"text-align:center;\"> 0.723 </td>\n </tr>\n</tbody>\n</table>"
allMDS.html %>% cat(file = 'tables/allMDS.html')
allMDS %>%
kable("latex", escape = F, digits = 3, align = 'c', booktabs = T) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> allMDS.latex
allMDS.latex %>% cat(file = 'tables/allMDS.tex')
# Print latex table to tex file
cat(docHead, allMDS.latex, docTail, file = 'tables/allMDS.tex')
write_csv(allMDS, 'tables/allMDSGlmPValues.csv')
at each taxonomic level
# How many taxa do we see if we agglomerate at different levels
psN2 %>% tax_table %>% as.data.frame %>% dplyr::select(Phylum:Genus) %>% colnames -> taxLevels
data_frame(taxLevels) %>%
mutate(ntaxa = map(taxLevels,
function(lev){
psN2 %>% tax_glom(lev) %>% ntaxa
}
)) %>%
mutate(ntaxa = unlist(ntaxa)) %>%
pass -> NTaxaAtLevel
NTaxaAtLevel
## # A tibble: 5 x 2
## taxLevels ntaxa
## <chr> <int>
## 1 Phylum 5
## 2 Class 12
## 3 Order 17
## 4 Family 39
## 5 Genus 104
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN2), ps = list((psN2))) -> specRow
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN1), psCount = list((psN1))) -> specRowC
D2K_savename <- function(distmat){
# cascade names forward with the D2K operation
require(MiRKAT)
out <- MiRKAT::D2K(distmat)
colnames(out) <- colnames(distmat)
rownames(out) <- rownames(distmat)
out
}
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
NTaxaAtLevel %>%
mutate(ps = map(ntaxa, ~tip_glom_saveid(psN2, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(ps = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRow) %>%
# calculate jensen-shannon distance matrix
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
## Loading required package: cluster
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) -> tmp
tmp %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf0 # Original way
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
# I use psN1 because I need count data for some downstream steps.
NTaxaAtLevel %>%
mutate(psCount = map(ntaxa, ~tip_glom_saveid(psN1, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(psCount = map(psCount, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRowC) %>%
pass -> tmp
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
## Warning: Setting row names on a tibble is deprecated.
tmp %>%
# calculate jensen-shannon distance matrix
mutate(ps = map(psCount, ~transform_sample_counts(., function(x) {x/sum(x)}))) %>%
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf
print(psDf)
## # A tibble: 6 x 9
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr
## <chr> <int> <list> <list> <list> <list> <list> <list> <list>
## 1 Phylum 5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class 12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order 17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family 39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus 104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species 651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
MirMulti <- function(x, KsDf = psDf, ps = psN2, nperm = 9999){
Ks = KsDf$kjsd
# I bind to the phyloseq object and then peel off again later to guerentee
# that the y-data is in the same order as the Ks
locPS <- phylo_join(ps, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
ydata <- ydata0[!yna]
loc.Ks <- lapply(Ks, function(K){K[!yna, !yna]})
bcxJSD <- MiRKAT(y = jac_box_cox_2(ydata), Ks = loc.Ks, out_type = "C", method = 'permutation', nperm = nperm)
medJSD <- MiRKAT(y = medcode(ydata), Ks = loc.Ks, out_type = "D", method = 'permutation', nperm = nperm)
mmDf = data.frame(
taxLevels = KsDf$taxLevels,
ntaxa = KsDf$ntaxa,
bcxJSD = bcxJSD$indivP, medJSD = medJSD$indivP,
bcxJSDOmni = bcxJSD$omnibus_p, medJSDOmni = medJSD$omnibus_p)
mmDf
}
# Test case
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& visitno == 2 & ct == 'T') -> test.immune1
test.mm <- MirMulti(test.immune1, Ks = psDf, nperm = 999)
test.mm
## taxLevels ntaxa bcxJSD medJSD bcxJSDOmni medJSDOmni
## 1 Phylum 5 0.217 0.228 0.162 0.0435
## 2 Class 12 0.282 0.228 0.162 0.0435
## 3 Order 17 0.048 0.011 0.162 0.0435
## 4 Family 39 0.123 0.031 0.162 0.0435
## 5 Genus 104 0.247 0.038 0.162 0.0435
## 6 Species 651 0.505 0.131 0.162 0.0435
ptm = proc.time()
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(mir = map(data,
~MirMulti(., Ks = psDf, ps = psN2, nperm = 999)
)) %>%
dplyr::select(-data) %>% unnest(mir) %>%
pass -> mirLevels
proc.time() - ptm
## user system elapsed
## 2.561 0.000 2.561
mirLevels %>% dplyr::select(-ntaxa, -medJSD) %>% spread(key = taxLevels, value = bcxJSD)
## # A tibble: 20 x 11
## type antigen month bcxJSDOmni medJSDOmni Class Family Genus Order
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.EN… 6.5 0.762 0.285 0.497 0.512 0.704 0.396
## 2 CD4+ ANY.EN… 12 0.054 0.148 0.014 0.146 0.136 0.057
## 3 IgA gp41 0 0.995 0.864 0.997 0.875 0.916 0.953
## 4 IgA gp41 6.5 0.259 0.362 0.316 0.546 0.401 0.395
## 5 IgA gp41 12 0.525 0.916 0.878 0.419 0.349 0.712
## 6 IgA p24 0 0.886 0.388 0.921 0.815 0.897 0.917
## 7 IgA p24 6.5 0.982 0.830 0.905 0.933 0.8 0.852
## 8 IgA p24 12 0.444 0.102 0.167 0.604 0.639 0.298
## 9 IgG Con.6.… 6.5 0.388 0.052 0.233 0.178 0.178 0.208
## 10 IgG Con.6.… 12 0.003 0.053 0.04 0.002 0.001 0.002
## 11 IgG gp41 0 0.158 0.0365 0.269 0.124 0.229 0.047
## 12 IgG gp41 6.5 0.334 0.0765 0.115 0.26 0.148 0.131
## 13 IgG gp41 12 0.310 0.722 0.156 0.3 0.179 0.135
## 14 IgG gp70_B… 6.5 0.526 0.949 0.217 0.415 0.429 0.376
## 15 IgG gp70_B… 12 0.114 0.086 0.169 0.0720 0.0920 0.032
## 16 IgG p24 0 0.893 0.476 0.718 0.951 0.611 0.913
## 17 IgG p24 6.5 0.0185 0.853 0.005 0.051 0.028 0.047
## 18 IgG p24 12 0.782 0.696 0.971 0.469 0.558 0.41
## 19 IgG ZM96.g… 6.5 0.097 0.0405 0.0860 0.113 0.057 0.124
## 20 IgG ZM96.g… 12 0.280 0.350 0.464 0.184 0.105 0.0960
## # ... with 2 more variables: Phylum <dbl>, Species <dbl>
mirLevels %>% dplyr::select(-ntaxa, -bcxJSD) %>% spread(key = taxLevels, value = medJSD)
## # A tibble: 20 x 11
## type antigen month bcxJSDOmni medJSDOmni Class Family Genus Order
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.EN… 6.5 0.762 0.285 0.102 0.193 0.268 0.0970
## 2 CD4+ ANY.EN… 12 0.054 0.148 0.047 0.164 0.184 0.061
## 3 IgA gp41 0 0.995 0.864 0.855 0.824 0.905 0.917
## 4 IgA gp41 6.5 0.259 0.362 0.652 0.485 0.43 0.364
## 5 IgA gp41 12 0.525 0.916 0.918 0.683 0.647 0.818
## 6 IgA p24 0 0.886 0.388 0.92 0.559 0.791 0.847
## 7 IgA p24 6.5 0.982 0.830 0.845 0.831 0.694 0.556
## 8 IgA p24 12 0.444 0.102 0.028 0.24 0.401 0.114
## 9 IgG Con.6.… 6.5 0.388 0.052 0.0830 0.019 0.014 0.015
## 10 IgG Con.6.… 12 0.003 0.053 0.0990 0.038 0.042 0.015
## 11 IgG gp41 0 0.158 0.0365 0.211 0.03 0.032 0.01
## 12 IgG gp41 6.5 0.334 0.0765 0.059 0.06 0.0690 0.021
## 13 IgG gp41 12 0.310 0.722 0.905 0.747 0.332 0.698
## 14 IgG gp70_B… 6.5 0.526 0.949 0.723 0.949 0.974 0.906
## 15 IgG gp70_B… 12 0.114 0.086 0.525 0.044 0.025 0.023
## 16 IgG p24 0 0.893 0.476 0.397 0.287 0.182 0.752
## 17 IgG p24 6.5 0.0185 0.853 0.479 0.673 0.502 0.69
## 18 IgG p24 12 0.782 0.696 0.745 0.329 0.374 0.348
## 19 IgG ZM96.g… 6.5 0.097 0.0405 0.0820 0.062 0.0720 0.043
## 20 IgG ZM96.g… 12 0.280 0.350 0.143 0.523 0.498 0.341
## # ... with 2 more variables: Phylum <dbl>, Species <dbl>
I’d like to combine the above into one table, when it isn’t 7:45. Probably has soemething to do with merging columns or something. Or maybe I just want to plot it as a figure.
mirLevels %>%
gather(metric, P, bcxJSD:medJSD) -> mirDat
mirDat %>% dplyr::select(type:month, bcxJSD = bcxJSDOmni, medJSD = medJSDOmni) %>%
group_by(type, antigen, month) %>%
summarize(bcxJSD = mean(bcxJSD), medJSD = mean(medJSD)) %>%
gather(metric, P, bcxJSD, medJSD) -> mirOmni
mirOmni
## # A tibble: 40 x 5
## # Groups: type, antigen [8]
## type antigen month metric P
## <chr> <chr> <dbl> <chr> <dbl>
## 1 CD4+ ANY.ENV.PTEG 6.5 bcxJSD 0.762
## 2 CD4+ ANY.ENV.PTEG 12 bcxJSD 0.054
## 3 IgA gp41 0 bcxJSD 0.995
## 4 IgA gp41 6.5 bcxJSD 0.259
## 5 IgA gp41 12 bcxJSD 0.525
## 6 IgA p24 0 bcxJSD 0.886
## 7 IgA p24 6.5 bcxJSD 0.982
## 8 IgA p24 12 bcxJSD 0.444
## 9 IgG Con.6.gp120.B 6.5 bcxJSD 0.388
## 10 IgG Con.6.gp120.B 12 bcxJSD 0.003
## # ... with 30 more rows
NTaxaAtLevel %>% bind_rows(specRow[,1:2]) %>% unite(nLev, taxLevels, ntaxa, remove = FALSE) -> NTaxaAtLevel2
NTaxaAtLevel2
## # A tibble: 6 x 3
## nLev taxLevels ntaxa
## <chr> <chr> <int>
## 1 Phylum_5 Phylum 5
## 2 Class_12 Class 12
## 3 Order_17 Order 17
## 4 Family_39 Family 39
## 5 Genus_104 Genus 104
## 6 Species_651 Species 651
bind_rows(
permKernTable %>% mutate(metric = 'med'),
permKernTableGaus %>% mutate(metric = 'bcx')
) %>%
dplyr::select(type, antigen, month, metric, mir.P) %>%
pass -> WufPData
fixant <- function(df){
df %>%
mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
#mutate(metric = stringr::str_replace_all(metric, "bcx", "")) %>%
pass
}
fixstuff <- function(df){
df %>%
fixant %>%
mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
pass
}
mirDat %>% fixant %>% head
## # A tibble: 6 x 9
## type antigen month taxLevels ntaxa bcxJSDOmni medJSDOmni metric P
## <chr> <chr> <dbl> <fct> <int> <dbl> <dbl> <chr> <dbl>
## 1 IgA gp41 6.5 Phylum 5 0.259 0.362 bcxJSD 0.0970
## 2 IgA gp41 6.5 Class 12 0.259 0.362 bcxJSD 0.316
## 3 IgA gp41 6.5 Order 17 0.259 0.362 bcxJSD 0.395
## 4 IgA gp41 6.5 Family 39 0.259 0.362 bcxJSD 0.546
## 5 IgA gp41 6.5 Genus 104 0.259 0.362 bcxJSD 0.401
## 6 IgA gp41 6.5 Species 651 0.259 0.362 bcxJSD 0.731
mirDat %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
fixstuff %>%
ggplot(aes(x = ntaxa, y = P, col = factor(month), fill = factor(month))) +
geom_point(pch = 21) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
#geom_hline(data = mirOmni, aes(yintercept = P, col = factor(month))) +
#annotation_logticks(sides = 'bl') +
#geom_rug(data = mirOmni, aes(y = P, col = factor(month)), inherit.aes = F) +
geom_point(data = mirOmni %>% ungroup %>% fixstuff,
aes(x = 3, y = P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 22, size = 2) +
geom_point(data = WufPData %>% ungroup %>% fixant,
aes(x = 1000, y = mir.P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 24, size = 2) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
labs(col = "month", fill = "month") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd0
options(repr.plot.width=8, repr.plot.height= 6)
pjsd0
options(par0)
# I'd like to add weighted unifrac as a tick mark on the right.
NTaxaAtLevel2
## # A tibble: 6 x 3
## nLev taxLevels ntaxa
## <chr> <chr> <int>
## 1 Phylum_5 Phylum 5
## 2 Class_12 Class 12
## 3 Order_17 Order 17
## 4 Family_39 Family 39
## 5 Genus_104 Genus 104
## 6 Species_651 Species 651
# New combined data frame that has omnibus, regular, and wunifrac all in one
bind_rows(
mirDat %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(test = "JSD"),
mirOmni %>% ungroup %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(taxLevels = "Omnibus") %>% mutate(test = "Omnibus"),
WufPData %>% ungroup %>% dplyr::rename(P = mir.P) %>%
mutate(taxLevels = "WUnifrac") %>% mutate(test = "WUnifrac")
) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1, "ANY.ENV.PTEG"))) %>%
mutate(type = factor(type, levels = c("IgA", "IgG", "CD4+"))) %>%
mutate(taxLevels = factor(taxLevels, levels = c("Omnibus", NTaxaAtLevel2$taxLevels, "WUnifrac"))) %>%
dplyr::select(-c(bcxJSDOmni:medJSDOmni))%>%
unite(nLev, taxLevels, ntaxa, remove = FALSE) %>%
mutate(nLev = stringr::str_replace(nLev, "_NA", "")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "\\.", " "))) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "_", " "))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
mutate(test = factor(test, levels = c('Omnibus', 'JSD', 'WUnifrac'))) %>%
pass -> mirDat2
## Warning in bind_rows_(x, .id): binding factor and character vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
mirDat2 %>% filter(type == 'CD4+')
## # A tibble: 32 x 9
## type antigen month nLev taxLevels ntaxa metric P test
## <fct> <fct> <dbl> <chr> <fct> <int> <chr> <dbl> <fct>
## 1 CD4+ ANY ENV PTEG 6.5 Phylum_5 Phylum 5 bcx 0.86 JSD
## 2 CD4+ ANY ENV PTEG 6.5 Class_12 Class 12 bcx 0.497 JSD
## 3 CD4+ ANY ENV PTEG 6.5 Order_17 Order 17 bcx 0.396 JSD
## 4 CD4+ ANY ENV PTEG 6.5 Family_39 Family 39 bcx 0.512 JSD
## 5 CD4+ ANY ENV PTEG 6.5 Genus_104 Genus 104 bcx 0.704 JSD
## 6 CD4+ ANY ENV PTEG 6.5 Species_651 Species 651 bcx 0.919 JSD
## 7 CD4+ ANY ENV PTEG 12 Phylum_5 Phylum 5 bcx 0.908 JSD
## 8 CD4+ ANY ENV PTEG 12 Class_12 Class 12 bcx 0.014 JSD
## 9 CD4+ ANY ENV PTEG 12 Order_17 Order 17 bcx 0.057 JSD
## 10 CD4+ ANY ENV PTEG 12 Family_39 Family 39 bcx 0.146 JSD
## # ... with 22 more rows
mirDat2 %>% head
## # A tibble: 6 x 9
## type antigen month nLev taxLevels ntaxa metric P test
## <fct> <fct> <dbl> <chr> <fct> <int> <chr> <dbl> <fct>
## 1 IgA gp41 6.5 Phylum_5 Phylum 5 bcx 0.0970 JSD
## 2 IgA gp41 6.5 Class_12 Class 12 bcx 0.316 JSD
## 3 IgA gp41 6.5 Order_17 Order 17 bcx 0.395 JSD
## 4 IgA gp41 6.5 Family_39 Family 39 bcx 0.546 JSD
## 5 IgA gp41 6.5 Genus_104 Genus 104 bcx 0.401 JSD
## 6 IgA gp41 6.5 Species_651 Species 651 bcx 0.731 JSD
mirDat2 %>%
filter(type == "IgG") %>%
ggplot(aes(x = taxLevels, y = P, col = factor(month), fill = factor(month), shape = test)) +
geom_point(size = 2) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
#scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
scale_shape_manual(values = c(22, 21, 24)) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
labs(col = "month", fill = "month", x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd
options(repr.plot.width=6, repr.plot.height= 8)
pjsd
ggsave('figures/KernelPVsLevel.png', width = 6, height = 8)
options(par0)
X-axis is now spaced evenly
Table SX. P values of kernel regression tests. Circles indicate jensen shannon values at different taxonomic resolutions. Squares are the omnibus p-value for that cohort of tests. Triangles indicate kernel regression p-values for the corresponding weighted unifrac test.
The blue and red lines indicate p values of 5% and 1% respectively.
Observations: The weighted unifrac test is sensitive. In cases where only one taxonic level hits, weighted unifrac often also falls at some statistically significant value. The omnibus p value is often higher than the weighted unifrac one. Weighted unifrac seems like a good test for identifying patterns at any level that relate to an outcome. The jensen shannon informs us about which level the pattern is observed.
I might even be able to drill down to every level.
model_each_species <- function(ps, f, pthresh = 1, q = FALSE){
# Start with the otu table
ps %>%
# reshape it so we have clr values for every taxon-sample pair
otu_table %>% as.data.frame %>% rownames_to_column("Sample") %>% gather(Taxon, clr, -Sample) %>%
# bind that to the sample data
# doing this here seems remarkably inefficient, but its not creating a bottleneck so I'll leave it.
left_join(
ps %>%
# the sample data need to have MDS1 and MDS2 appended to them
phylo_join(
psN2.pcoa %>% scores(display = "sites") %>% # hardcoded psN2.pcoa
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2'),
by = 'rowname'
) %>%
# back to binding to sample data
sample_data %>% as('data.frame') %>% rownames_to_column("Sample"),
by = 'Sample') %>%
group_by(Taxon) %>% # group and nest for model run
nest %>%
mutate(Mod = map(data, f)) %>% # apply model over each species
mutate(Glance = map(Mod, glance), Tidy = map(Mod, tidy)) %>% # extract relevant data from model
# view model
dplyr::select(Taxon, Tidy) %>% unnest %>%
mutate(term = gsub('[\\( \\)]','', term)) %>% # remove parentheses from "(Intercept)"
gather(meas, val, estimate:p.value) %>%
unite(meas, term, meas) %>% spread(meas, val) %>% arrange(clr_estimate) %>%
dplyr::select(Taxon, Intercept_estimate, clr_estimate, clr_std.error, clr_p.value) %>%
# add q value
{if(q) mutate(., clr_q.value = p2q(clr_p.value)) else .} %>%
filter(clr_p.value < pthresh) %>%
#Join taxonomy information
left_join(
as.data.frame(ps@tax_table@.Data) %>% as.tibble %>% dplyr::select(Kingdom:Genus, Species, tag) %>%
mutate(tag = as.character(tag)), # mutate so tag is and taxon are both character class
by = c("Taxon" = "tag")) %>%
pass
}
model_each_species_for_antigen <- function(antigen, ps = psN2){
ps %>%
model_each_species(function(df){glm(medcode2(get(antigen)) ~ clr, data = df, family = 'binomial')}, q = TRUE, pthresh = 1)
}
ColsToRun <- c('IgG_Con.6.gp120.B_Month_6.5', 'IgG_Con.6.gp120.B_Month_12', 'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_gp70_B.CaseA_V1_V2_Month_12', 'IgG_ZM96.gp140_Month_6.5', 'MDS1', 'isMale' )
model_each_species_case <- function(ps){
ps %>% model_each_species(function(df){glm(MDS1 ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = Taxon[order(clr_estimate)])) %>%
mutate(test = 'gaussian', antigen = 'MDS1') %>%
pass -> loc_mds1Glms
ps %>% model_each_species(function(df){glm(log10(IgG_Con.6.gp120.B_Month_12 + 100) ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'gaussian', antigen = 'Con.6.gp120.B_Month_12') %>%
pass -> loc_gp120Glms
tibble(antigen = ColsToRun) %>% mutate(model = map(antigen, ~model_each_species_for_antigen(., ps = ps))) %>%
unnest %>% mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'binomial') %>%
pass-> loc_logitCoefs
#list(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs)
bind_rows(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs) %>% dplyr::select(test, antigen, everything())
}
#psDf %>% mutate(ps2 = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.))))) -> test
#psDf[[1,"clr"]] %>% tax_table
psDf[[1]]
## [1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
psDf %>% print
## # A tibble: 6 x 9
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr
## <chr> <int> <list> <list> <list> <list> <list> <list> <list>
## 1 Phylum 5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class 12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order 17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family 39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus 104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species 651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
ptm = proc.time()
psDf %>% dplyr::select(taxLevels, ntaxa, clr) %>% mutate(localmod = map(clr, model_each_species_case)) ->psDfLoc
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
proc.time() - ptm
## user system elapsed
## 41.692 0.032 41.727
print(psDfLoc)
## # A tibble: 6 x 4
## taxLevels ntaxa clr localmod
## <chr> <int> <list> <list>
## 1 Phylum 5 <S4: phyloseq> <tibble [50 × 15]>
## 2 Class 12 <S4: phyloseq> <tibble [120 × 15]>
## 3 Order 17 <S4: phyloseq> <tibble [170 × 15]>
## 4 Family 39 <S4: phyloseq> <tibble [390 × 15]>
## 5 Genus 104 <S4: phyloseq> <tibble [1,040 × 15]>
## 6 Species 651 <S4: phyloseq> <tibble [6,510 × 15]>
psDfLoc %>% dplyr::select(-clr) %>% unnest(localmod) -> tmp
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
psDfLoc$taxLevels
## [1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
tmp %>% mutate(taxLevels = factor(taxLevels, levels = psDfLoc$taxLevels)) -> LocalTests
LocalTests %>%
filter(antigen != "MDS1") %>%
write_csv("tables/AllLocalTests.csv")
LocalTests %>% head
## # A tibble: 6 x 17
## taxLevels ntaxa test antigen Taxon Intercept_estim… clr_estimate
## <fct> <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 Phylum 5 gaus… MDS1 Bact… 1.94 -0.762
## 2 Phylum 5 gaus… MDS1 Porp… -1.03 -0.514
## 3 Phylum 5 gaus… MDS1 Acti… -0.201 -0.104
## 4 Phylum 5 gaus… MDS1 Camp… 0.439 0.242
## 5 Phylum 5 gaus… MDS1 Bact… -1.42 0.445
## 6 Phylum 5 gaus… Con.6.… Bact… 3.76 -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## # clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## # Order <chr>, Family <chr>, Genus <chr>, Species <chr>
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_q.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
labs(y = "q-value", x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalQEveryLevel.png', width = 6, height = 8, units = "in")
BOOKMARK
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) +# scale_y_log10() +
labs(y = expression(paste(italic("p"),"-value")), x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel.png', width = 6, height = 8, units = "in")
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
labs(y = expression(paste(italic("p"),"-value")), x = "Taxonomic Level") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel_LogScale.png', width = 6, height = 8, units = "in")
order_taxa_by_mds1 <- function(df){
# this has to be a model_each_species type of data frame
df %>% filter(antigen == 'MDS1' & test == 'gaussian') %>%
mutate(TaxonF = factor(Taxon, levels = Taxon[order(clr_estimate)])) -> mds1df
df %>% mutate(TaxonF = factor(Taxon, levels = levels(mds1df$TaxonF)))
}
To my annoyance, everything is labeled with IgG except gp120_12
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen,
levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5',
'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12'))) %>%
filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
dplyr::select(antigen:clr_estimate) %>%
dplyr::select(-Intercept_estimate) %>%
mutate(cordir = sign(clr_estimate)) %>%
pass
## # A tibble: 11 x 4
## antigen Taxon clr_estimate cordir
## <fct> <chr> <dbl> <dbl>
## 1 Con.6.gp120.B_Month_12 Bacteroidetes.4 -0.307 -1
## 2 Con.6.gp120.B_Month_12 Firmicutes.4 -0.295 -1
## 3 Con.6.gp120.B_Month_12 Firmicutes.2 -0.157 -1
## 4 Con.6.gp120.B_Month_12 Clostridia -0.143 -1
## 5 Con.6.gp120.B_Month_12 Bacteroides 0.204 1
## 6 Con.6.gp120.B_Month_12 Firmicutes.5 0.207 1
## 7 IgG_gp41_Month_0 Porphyromonadaceae.1 -2.12 -1
## 8 IgG_gp41_Month_0 Porphyromonadaceae -1.69 -1
## 9 IgG_gp41_Month_0 Bacteroidetes.1 -1.20 -1
## 10 IgG_gp41_Month_0 Clostridia 1.31 1
## 11 IgG_gp41_Month_0 Anaerococcus 1.53 1
LocalTests %>% pull(antigen) %>% unique
## [1] "MDS1" "Con.6.gp120.B_Month_12"
## [3] "IgG_Con.6.gp120.B_Month_6.5" "IgG_Con.6.gp120.B_Month_12"
## [5] "IgG_gp41_Month_0" "IgG_gp41_Month_6.5"
## [7] "IgG_gp70_B.CaseA_V1_V2_Month_12" "IgG_ZM96.gp140_Month_6.5"
## [9] "isMale"
labs(y = expression(paste(italic(“p”),“-value”)), x = “Taxonomic Level”) +
# Family Hits
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c(
'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'
))) %>%
pass -> tmp
#tmp$antigen %>% unique
tmp %>% filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
pull(Taxon) %>% unique -> useFamily
tmp %>% filter(Taxon %in% useFamily) %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(aes(x = TaxonF, y = clr_estimate,
color = (clr_p.value < 0.05), shape =(clr_q.value < 0.2))) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = clr_estimate - 2*clr_std.error, ymax = clr_estimate + 2*clr_std.error)) +
labs(y = "Coefficient",
x = "Family Level Taxon",
col = expression(paste(italic("p"), "-value")),
shape = "q-value") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_hline(yintercept = 0) +
facet_wrap(~antigen, ncol = 1, scales = 'free_y')
# Show the censored ones accross - so this would be everything with at least one hit
# but also show what they are in all cases.
ggsave('figures/anyFamilyIgg.png', width = 6, height = 8)
I think its worth digging into clostridia and Prophyromonidaceae with stacked bars
Family level
Lets come back to this after we’ve done the local tests. Since we need them to color code the axes.
psDf %>% print
## # A tibble: 6 x 9
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr
## <chr> <int> <list> <list> <list> <list> <list> <list> <list>
## 1 Phylum 5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class 12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order 17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family 39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus 104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species 651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(ps) %>% pull %>%.[[1]]
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 39 taxa and 21 samples ]
## sample_data() Sample Data: [ 21 samples by 35 sample variables ]
## tax_table() Taxonomy Table: [ 39 taxa by 12 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 39 tips and 38 internal nodes ]
print(psDf)
## # A tibble: 6 x 9
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr
## <chr> <int> <list> <list> <list> <list> <list> <list> <list>
## 1 Phylum 5 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 2 Class 12 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 3 Order 17 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 4 Family 39 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 5 Genus 104 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
## 6 Species 651 <S4: phy… <S4: p… <S3: … <dbl [2… <dbl … <S4: ph… <S4: …
#nFamilyTaxa <- NTaxaAtLevel %>% filter(taxLevels == 'Family') %>% pull(ntaxa)
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(psNoZero) %>% pull %>%.[[1]] %>%
otu_table %>% as.data.frame %>%
pass -> myRel
ptm = proc.time()
phiBoot <- boot(data = myRel, statistic = boot_phi, R = 1000)
proc.time() - ptm
## user system elapsed
## 6.513 0.000 6.604
ptm = proc.time()
tidyCI <- unwarn(
tidy(phiBoot,conf.int=TRUE,conf.method="bca")
)
proc.time() - ptm
## user system elapsed
## 13.825 0.028 13.853
myRel %>% make_proportionality_matrix %>%
as.data.frame %>%
rownames_to_column("TaxonX") %>% gather(TaxonY, phi, -TaxonX) %>%
filter(TaxonX != TaxonY) %>% data.frame(tidyCI) -> namedTidyCI
head(LocalTests)
## # A tibble: 6 x 17
## taxLevels ntaxa test antigen Taxon Intercept_estim… clr_estimate
## <fct> <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 Phylum 5 gaus… MDS1 Bact… 1.94 -0.762
## 2 Phylum 5 gaus… MDS1 Porp… -1.03 -0.514
## 3 Phylum 5 gaus… MDS1 Acti… -0.201 -0.104
## 4 Phylum 5 gaus… MDS1 Camp… 0.439 0.242
## 5 Phylum 5 gaus… MDS1 Bact… -1.42 0.445
## 6 Phylum 5 gaus… Con.6.… Bact… 3.76 -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## # clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## # Order <chr>, Family <chr>, Genus <chr>, Species <chr>
LocalTests %>% filter(test == 'gaussian' &
antigen == 'MDS1' &
clr_p.value <0.05 &
clr_q.value < 0.2&
taxLevels == "Family") -> tmp
tmp %>% pull(Taxon) -> MDS1Fam
tmp %>% filter(clr_estimate < 0) %>% pull(Taxon) -> lowMDS1Fam
tmp %>% filter(clr_estimate >= 0) %>% pull(Taxon) -> highMDS1Fam
I’d like to do this, but for gp41 baseline and gp120 as well.
useFamily
## [1] "Bacteroidetes.4" "Firmicutes.4" "Firmicutes.2"
## [4] "Clostridia" "Bacteroides" "Firmicutes.5"
## [7] "Porphyromonadaceae.1" "Porphyromonadaceae" "Bacteroidetes.1"
## [10] "Anaerococcus"
LocalTests %>% head
## # A tibble: 6 x 17
## taxLevels ntaxa test antigen Taxon Intercept_estim… clr_estimate
## <fct> <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 Phylum 5 gaus… MDS1 Bact… 1.94 -0.762
## 2 Phylum 5 gaus… MDS1 Porp… -1.03 -0.514
## 3 Phylum 5 gaus… MDS1 Acti… -0.201 -0.104
## 4 Phylum 5 gaus… MDS1 Camp… 0.439 0.242
## 5 Phylum 5 gaus… MDS1 Bact… -1.42 0.445
## 6 Phylum 5 gaus… Con.6.… Bact… 3.76 -0.285
## # ... with 10 more variables: clr_std.error <dbl>, clr_p.value <dbl>,
## # clr_q.value <dbl>, Kingdom <fct>, Phylum <chr>, Class <chr>,
## # Order <chr>, Family <chr>, Genus <chr>, Species <chr>
reshape2::melt
## function (data, ..., na.rm = FALSE, value.name = "value")
## {
## UseMethod("melt", data)
## }
## <bytecode: 0x55fe9434db80>
## <environment: namespace:reshape2>
targStat <- "phi"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
p_phi_1 <- ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" ) +
# rectangles around the three clusters, positioned by eye
geom_rect(aes(xmin = 0 + 0.5, xmax = 10 - 0.5, ymin = 0 + 0.5, ymax = 10 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 13 + 0.5, xmax = 24 - 0.5, ymin = 13 + 0.5, ymax = 24 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 23 + 0.5, xmax = 37 - 0.5, ymin = 23 + 0.5, ymax = 37 - 0.5),
fill = "transparent", color = "gray20", size = 1.5)
p_phi_1
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))) %>%
pass -> tmp
tmp %>% dplyr::select(antigen, Taxon, clr_estimate, clr_p.value, clr_q.value) %>%
mutate(clr_sign = sign(clr_estimate)) %>%
mutate(isHit = ifelse(clr_p.value < 0.05 & clr_q.value < 0.2, 1, 0)) %>%
mutate(Taxon = factor(Taxon, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass -> chorddata
targStat <- "conf.low"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_low
p_phi_low
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
targStat <- "conf.high"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_high
p_phi_high
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
chorddata %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(
aes(x = Taxon, y = antigen, alpha = factor(isHit), color = factor(clr_sign))) +
scale_alpha_discrete(range = c(0, 1)) +
guides(alpha = FALSE) +
theme_minimal() +
coord_fixed(ratio = 2) +
scale_colour_manual(values = c("red", "blue")) +
theme(axis.text.x = element_text(
angle = 90, vjust = 0.5, size = 7, hjust = 1,
face = ifelse(levels(chorddata$Taxon) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(chorddata$Taxon) %in% useFamily, "black", "grey30")),
plot.margin = unit(c(0,3,1,3), "cm")
) +
#guides(col = TRUE) +
guides(color=guide_legend(title="Sign of GLM")) +
labs(x = "Family",y = "Antigen -- Month" ) +
geom_point() -> guitar_chords
## Warning: Using alpha for a discrete variable is not advised.
par <- options()
options(repr.plot.width=10, repr.plot.height= 5)
guitar_chords
options(par)
p_phi_1a <- p_phi_1 +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1, 3, -5.5, 4), "cm"))
par <- options()
options(repr.plot.width=8, repr.plot.height= 8)
p_phi_cord <- cowplot::plot_grid(p_phi_1a, guitar_chords, nrow = 2, align = "v")
p_phi_cord
#phi_legend <- cowplot::get_legend(p_phi_1)
# cowplot::ggdraw(
# cowplot::plot_grid(
# cowplot::plot_grid(p_phi_1a, guitar_chords, ncol = 1, align = "v"),
# cowplot::plot_grid(phi_legend, NULL, ncol = 1),
# rel_widths = c(10,1)
# ))
ggsave('figures/phi_heatmap_withlegend.png', width = 10, height = 10)
options(par)
# More color-blind friendly colorbalettes
#http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=10
cb10 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a')
cb12 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928')
# Less color-blind friendly, but still nice.
#https://sashat.me/2017/01/11/list-of-20-simple-distinct-colors/
trub20 <- c('#e6194b','#3cb44b','#ffe119','#0082c8','#f58231','#911eb4','#46f0f0','#f032e6','#d2f53c','#fabebe','#008080','#e6beff','#aa6e28','#fffac8','#800000','#aaffc3','#808000','#ffd8b1','#000080','#808080','#FFFFFF','#000000')
options(repr.plot.width=8, repr.plot.height= 4)
Bookmark Here. Stuck for strange reasons.
ordered_pub_id_df <- psN2 %>% sample_data %>% dplyr::select(pub_id, rMDS1, newname, MDS1) %>% arrange(rMDS1) %>% mutate(MDS1 = round(MDS1, 2), fig3tick = paste(pub_id, MDS1,sep = "_"))
## Warning in class(x) <- c(subclass, "tbl_df", "tbl", "data.frame"): Setting
## class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer
## be an S4 object
ordered_pub_id_df
## pub_id rMDS1 newname MDS1 fig3tick
## 1 21 1 01_Sample-89 -0.79 21_-0.79
## 2 45 2 02_Sample-76 -0.70 45_-0.7
## 3 282 3 03_Sample-57 -0.69 282_-0.69
## 4 162 4 04_Sample-73 -0.56 162_-0.56
## 5 236 5 05_Sample-82 -0.53 236_-0.53
## 6 40 6 06_Sample-93 -0.51 40_-0.51
## 7 92 7 07_Sample-98 -0.47 92_-0.47
## 8 238 8 08_Sample-67 -0.40 238_-0.4
## 9 176 9 09_Sample-60 -0.24 176_-0.24
## 10 151 10 10_Sample-72 -0.19 151_-0.19
## 11 290 11 11_Sample-86 -0.16 290_-0.16
## 12 19 12 12_Sample-84 -0.04 19_-0.04
## 13 3 13 13_Sample-87 0.02 3_0.02
## 14 49 14 14_Sample-63 0.03 49_0.03
## 15 247 15 15_Sample-91 0.42 247_0.42
## 16 123 16 16_Sample-62 0.55 123_0.55
## 17 208 17 17_Sample-81 0.65 208_0.65
## 18 175 18 18_Sample-92 0.68 175_0.68
## 19 281 19 19_Sample-78 0.71 281_0.71
## 20 288 20 20_Sample-66 0.77 288_0.77
## 21 89 21 21_Sample-75 1.46 89_1.46
fig3tick <- ordered_pub_id_df %>% pull(fig3tick)
p_phy <- plot_bar(psN2, x = 'newname', fill = 'Phylum') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("All Phyla")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_phy
#ggsave('plots/Phyla_by_wuf1.png')
p_firm <- subset_taxa(psN2, Phylum == 'Firmicutes') %>%
plot_bar( x = 'newname', fill = 'Order') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) +
scale_fill_manual(values = cb10) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_firm
#ggsave('plots/MostFirmicutesAreClostridiales.png')
p_clostridia <- subset_taxa(psN2, Class == 'Clostridia') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Order Clostridiales (All Class Clostridia)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_clostridia
# p_porph <- subset_taxa(psN2, Family == 'Porphyromonadaceae') %>%
# plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) #+ theme_bw()
# p_porph
# p_bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>% # all class (Bacteroidia), order (Bacteroidales)
# plot_bar(x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) #+ theme_bw()
# p
# ggsave('figures/Bacteroidetes_Families.png')
p_ClosXI <- subset_taxa(psN2, Family == 'Clostridiales_Incertae_Sedis_XI') %>%
plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Genera of Family Clostridiales_Incertae_Sedis_XI")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_ClosXI
#ggsave('figures/Clostridiales_Incertae_Sedis_XI_Genus.png')
p_Bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Class Bacteroidetes (All Order Bacteroidiales)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_Bact
#ggsave('figures/Bacteroides.png')
options(repr.plot.width=14, repr.plot.height= 8)
sb <- cowplot::plot_grid(p_phy, p_Bact,p_clostridia,p_ClosXI,ncol = 2, labels = c("A", "B", "C", "D"))
sb
cowplot::save_plot('figures/stacked_bars.png', sb, base_width = 14, base_height = 8)
cowplot::save_plot('figures/stacked_bars.svg', sb, base_width = 14, base_height = 8)
# psDf %>%
# mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
# mutate(Tax = map(ps, ~data.frame(tax_table(.)))) %>%
# mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
# pass -> psDf1
psDf %>%
mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
mutate(Tax = map(ps, ~as.data.frame(.@tax_table@.Data))) %>%
mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
pass -> psDf1
psDf1 %>%
.[1:5,] %>%
mutate(Tax = map(Tax, ~dplyr::select(.,-oldname2))) %>%
print
## # A tibble: 5 x 12
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr OTU
## <chr> <int> <list> <lis> <lis> <list> <lis> <list> <lis> <lis>
## 1 Phylum 5 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 2 Class 12 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 3 Order 17 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 4 Family 39 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 5 Genus 104 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## # ... with 2 more variables: Tax <list>, OTUCount <list>
# https://stackoverflow.com/questions/50341012/return-the-mapped-object-if-expression-inside-of-purrrpossibly-fails/50341205#50341205
rm_oldname2 <- function(x){
f = possibly(function() dplyr::select(x, -oldname2), otherwise = x)
f()
}
psDf1 %>%
#.[1:5,] %>%
mutate(Tax = map(Tax, rm_oldname2)) %>%
pass -> psDf1b
print(psDf1)
## # A tibble: 6 x 12
## taxLevels ntaxa psCount ps jsd jsdMat kjsd psNoZero clr OTU
## <chr> <int> <list> <lis> <lis> <list> <lis> <list> <lis> <lis>
## 1 Phylum 5 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 2 Class 12 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 3 Order 17 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 4 Family 39 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 5 Genus 104 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## 6 Species 651 <S4: p… <S4:… <S3:… <dbl … <dbl… <S4: ph… <S4:… <dat…
## # ... with 2 more variables: Tax <list>, OTUCount <list>
# Show which species level OTUs are contained in each agglomerated group:
psDf1b %>%
.[1:5,] %>%
mutate(TaxIdx = map(Tax, function(df){
df %>%
mutate(tag = as.character(tag), oldGroups = as.character(oldGroups)) %>%
dplyr::select(tag, oldGroups) %>%
mutate(oldGroups = strsplit(oldGroups, ",")) %>%
unnest(oldGroups)
})) %>%
dplyr::select(taxLevels, TaxIdx) %>%
unnest(TaxIdx) %>%
mutate(oldGroups = trimws(oldGroups)) %>% # Some of these have leading or trailing whitespace
spread(taxLevels, tag) %>%
dplyr::select(oldGroups, Phylum, Class, Order, Family, Genus) %>%
pass -> taxGroupMapping
write_csv(taxGroupMapping, 'tables/taxGroupMapping.csv')
# Print out each otu table (relative abundances).
walk2(psDf1b$taxLevels, psDf1b$OTU,
~write.csv(.y, file = paste0("tables/OTU/otu_",.x, ".csv")))
# Print out each otu table (counts).
walk2(psDf1b$taxLevels, psDf1b$OTUCount,
~write.csv(.y, file = paste0("tables/OTU/otuCount_",.x, ".csv")))
# Print out each taxonomy table.
walk2(psDf1b$taxLevels, psDf1b$Tax,
~write_csv(.y, path = paste0("tables/Tax/tax_",.x, ".csv")))
The reviewer asks if it associates with MDS1. I’ll check for associations with immunogenicity as well.
Calculate the alpha diversity values for psN1
bN1 <- breakaway(psN1)
Initial look at alpha diveristy values.
plot(bN1)
Large confidence intervals. Also, probably best to examine in log space going forward.
Range of breakaway estimates
summary(bN1)$estimate %>% range
## [1] 151.0282 441.4724
btN_MDS <- betta(summary(bN1)$estimate,
summary(bN1)$error,
make_design_matrix(psN1, "MDS1")
)
btN_MDS
## $table
## Estimates Standard Errors p-values
## (Intercept) 243.9720 16.82711 0.000
## predictors -31.7198 27.31450 0.246
##
## $cov
## (Intercept) predictors
## (Intercept) 287.4724 57.2095
## predictors 57.2095 757.4669
##
## $ssq_u
## [1] 1932.271
##
## $homogeneity
## [1] 6.250059e+01 1.546382e-06
##
## $global
## [1] 215.6907 0.0000
##
## $blups
## [1] 190.9549 235.6741 241.4954 196.3625 222.4246 270.5355 262.2978
## [8] 280.2053 194.4011 282.1363 231.4248 227.0622 253.5892 260.3696
## [15] 266.7135 248.6283 296.0946 232.9051 201.2419 264.0743 264.8205
Not in any way that is statistically significant (p = 0.246)
Some pre-computing
richEsts <- bN1 %>% map_dbl("estimate")
richCI <- bN1 %>% map_df("interval") %>% t %>% as.data.frame() %>% rename(richLb = V1, richUb = V2) %>% merge(as.data.frame(richEsts), by = "row.names") %>% transform(row.names = Row.names, richEst = richEsts, Row.names = NULL, richEsts = NULL)
# need to add mds1 to this, or just tack this all onto psN1rich
psN1rich <- psN1
sample_data(psN1rich) <- merge(psN1@sam_data, richCI, by = "row.names") %>% transform(row.names = Row.names, Row.names = NULL)
Plot richness vs mds1
psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst)) + geom_point() #+ geom_errorbar()
As above but with error bars.
psN1richP <- psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb)) + geom_point() + geom_errorbar() + scale_y_log10()
psN1richP
The reviewer actually asked whether unifrac distance associates with alpha diversity. I’ve done that with MDS1, but not distance per se. Lets do one kernel regression, where we ask whether unifrac distance associates with richness.
Kernel regression, weighted unifrac vs richness
mirRich <- MiRKAT(y = richEsts, Ks = wufKN2, out_type = "C", method = 'permutation', nperm = jnperm)
mirRich
## [1] 0.22179
Very similar p-value to running betta against MDS1.
PCoA figure that compares MDS1 and MDS2 to richness
psN1richMDSP<- psN1rich %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = richEst), size = 5, stroke = 1, shape = 21) +
viridis::scale_fill_viridis(name = 'richEst', direction = 1, option = "viridis") +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
cowplot::theme_cowplot()
psN1richMDSP
Here is a discrete example originally I used Gp120 month 6.5. Switching to gp41 month 6.5, since subsequent analyis suggested that it does show a relationship, and so makes a more useful exemplar
gpLocmtx <- cbind(1,
psN1 %>% sample_data %>% as.matrix %>% as.data.frame %>% pull(IgG_gp41_Month_6.5) %>% as.character %>% as.numeric %>% medcode
)
colnames(gpLocmtx) = c("(Intercept)", "predictors")
btNLoc <- betta(summary(bN1)$estimate,
summary(bN1)$error,
gpLocmtx
)
btNLoc
## $table
## Estimates Standard Errors p-values
## (Intercept) 209.10249 13.30004 0.000
## predictors 76.38459 22.30439 0.001
##
## $cov
## (Intercept) predictors
## (Intercept) 274.4924 -274.4924
## predictors -274.4924 771.9782
##
## $ssq_u
## [1] 780.9728
##
## $homogeneity
## [1] 24.0326989 0.1949011
##
## $global
## [1] 323.1194 0.0000
##
## $blups
## [1] 184.0879 256.1046 284.6409 195.8995 211.0920 290.4048 217.0720
## [8] 288.8596 204.3579 291.4781 213.4344 210.8080 231.2472 222.1869
## [15] 291.5582 286.9768 296.3785 285.9490 200.8389 280.4231 287.5847
Step 1, make a data frame with BN1, but pub_id for each sample
Pre calculations
SampleNameToPubId <- psN1 %>% sample_data %>% as.data.frame %>% rownames_to_column() %>% as.tibble %>% dplyr::select(rowname, pub_id) %>% na.omit()
## Warning in class(x) <- c(subclass, "tbl_df", "tbl", "data.frame"): Setting
## class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer
## be an S4 object
A function that takes immunogenicity data, x (which we will pass in with tidyverse mapping functions), ad, the breakaway richness summary, and a transformation, defaults to medcode2 of the immunogenicity data.
immuneValpha <- function(x, ad = bN1, transformation = medcode2){
#x <- arrange(x, rowname)
x2 <- right_join(x, SampleNameToPubId, by = 'pub_id')
x3 <- x2[is.finite(x2$mag),]
ad2 <- ad[is.finite(x2$mag)]
class(ad2) <- c("alpha_estimates", "list")
gpXmtx = cbind(1, transformation(x3$mag))
#gpXmtx
thing <- betta(summary(ad2)$estimate,
summary(ad2)$error,
gpXmtx
)
out <- as.data.frame(t(thing$table[2,]))
#colnames(out) = "pval"
# add R^2 value, from simple pearson correlation
locCor <- cor(summary(ad2)$estimate, gpXmtx[,2], method = "pearson")
r2 <- locCor^2
out2 <- data.frame(out, r2)
out2
}
# test a single use case
immuneValpha(use.immune %>% filter(visitno == 9, type == "IgG", antigen == "Con.6.gp120.B"), bN1, transformation = medcode)
## Estimates Standard.Errors p.values r2
## 1 21.03933 22.41648 0.348 0.002631791
Actually run the analysis, raw table
immuneAlphaCompiled <- use.immune %>%
group_by(type, antigen, month) %>%
do(immuneValpha(.)) %>%
ungroup # if you forget to ungroup, pvalue calculations don't work correctly
immuneAlphaTable <- immuneAlphaCompiled %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTable
## # A tibble: 20 x 8
## type antigen month Estimates Standard.Errors p.values r2 qval
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.ENV.… 6.5 40.7 19.9 0.041 3.26e-2 0.0911
## 2 CD4+ ANY.ENV.… 12 44.8 17.9 0.013 5.57e-3 0.0371
## 3 IgA gp41 0 -1.84 27.3 0.946 3.22e-2 0.946
## 4 IgA gp41 6.5 16.3 26.2 0.534 1.45e-1 0.628
## 5 IgA gp41 12 68.4 28.7 0.017 2.11e-1 0.0425
## 6 IgA p24 0 5.46 24.9 0.827 6.81e-3 0.871
## 7 IgA p24 6.5 56.6 21.9 0.01 1.60e-1 0.0333
## 8 IgA p24 12 32.9 23.8 0.168 1.37e-2 0.28
## 9 IgG Con.6.gp… 6.5 21.0 22.4 0.348 2.63e-3 0.535
## 10 IgG Con.6.gp… 12 17.5 22.7 0.44 1.33e-3 0.587
## 11 IgG gp41 0 17.1 25.6 0.503 7.75e-4 0.628
## 12 IgG gp41 6.5 76.4 22.3 0.001 2.47e-1 0.01
## 13 IgG gp41 12 61.0 23.4 0.009 9.44e-2 0.0333
## 14 IgG gp70_B.C… 6.5 -5.83 24.6 0.812 6.38e-3 0.871
## 15 IgG gp70_B.C… 12 70.5 14.7 0 6.21e-2 0
## 16 IgG p24 0 -30.8 20.3 0.129 3.33e-2 0.235
## 17 IgG p24 6.5 60.6 23.5 0.01 1.17e-1 0.0333
## 18 IgG p24 12 47.3 26.4 0.074 2.34e-1 0.148
## 19 IgG ZM96.gp1… 6.5 21.9 25.3 0.387 3.55e-2 0.553
## 20 IgG ZM96.gp1… 12 66.8 22.0 0.002 9.50e-2 0.0133
Pretty up the table above for publication
conciseImmuneAlphaTable <- immuneAlphaTable %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTable <- conciseImmuneAlphaTable %>%
mutate(
Coef = cell_spec(format(Coef, digits = 3), "html",
bold = ifelse(P < 0.05, T, F),
italic = ifelse(P < 0.05 & Coef < 0, T, F),
background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
R2 = cell_spec(round(R2, digits = 3), "html"),
P = cell_spec(format_round(P, 3), "html",
bold = ifelse(P < 0.05, T, F),
background = ifelse(P < 0.05, 'yellow', '')
),
Q = cell_spec(format_round(Q, 3), "html",
bold = ifelse(Q < 0.2, T, F),
background = ifelse(Q < 0.2, 'lightyellow', '')
)
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))
toAlphaTable %>% kable("html", escape = F, digits = 3, align = 'c') %>%
kable_styling("striped", "hover", full_width = F) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> conciseAlphaTable.html
conciseAlphaTable.html %>% cat(file = 'tables/conciseAlphaTable.html')
conciseAlphaTable.html
| Type | Antigen | Month | Coef | R2 | P | Q |
|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 40.69 | 0.033 | 0.041 | 0.091 |
| 12.0 | 44.76 | 0.006 | 0.013 | 0.037 | ||
| IgA | gp41 | 0.0 | -1.84 | 0.032 | 0.946 | 0.946 |
| 6.5 | 16.30 | 0.145 | 0.534 | 0.628 | ||
| 12.0 | 68.38 | 0.211 | 0.017 | 0.042 | ||
| p24 | 0.0 | 5.46 | 0.007 | 0.827 | 0.871 | |
| 6.5 | 56.58 | 0.16 | 0.010 | 0.033 | ||
| 12.0 | 32.85 | 0.014 | 0.168 | 0.280 | ||
| IgG | Con.6.gp120.B | 6.5 | 21.04 | 0.003 | 0.348 | 0.535 |
| 12.0 | 17.55 | 0.001 | 0.440 | 0.587 | ||
| gp41 | 0.0 | 17.12 | 0.001 | 0.503 | 0.628 | |
| 6.5 | 76.38 | 0.247 | 0.001 | 0.010 | ||
| 12.0 | 61.02 | 0.094 | 0.009 | 0.033 | ||
| gp70 B.CaseA V1-V2 | 6.5 | -5.83 | 0.006 | 0.812 | 0.871 | |
| 12.0 | 70.51 | 0.062 | 0.000 | 0.000 | ||
| p24 | 0.0 | -30.85 | 0.033 | 0.129 | 0.235 | |
| 6.5 | 60.60 | 0.117 | 0.010 | 0.033 | ||
| 12.0 | 47.26 | 0.234 | 0.074 | 0.148 | ||
| ZM96.gp140 | 6.5 | 21.89 | 0.036 | 0.387 | 0.553 | |
| 12.0 | 66.78 | 0.095 | 0.002 | 0.013 |
immuneAlphaCompiledGaus <- use.immune %>%
group_by(type, antigen, month) %>%
do(immuneValpha(., transformation = jac_box_cox_2)) %>%
ungroup
immuneAlphaTableGaus <- immuneAlphaCompiledGaus %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTableGaus
## # A tibble: 20 x 8
## type antigen month Estimates Standard.Errors p.values r2 qval
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.ENV.P… 6.5 30.5 12.6 0.016 1.34e-1 0.064
## 2 CD4+ ANY.ENV.P… 12 27.5 16.4 0.094 9.89e-4 0.235
## 3 IgA gp41 0 -6.51 18.0 0.718 3.43e-2 0.818
## 4 IgA gp41 6.5 2.15 15.5 0.89 6.04e-2 0.89
## 5 IgA gp41 12 32.9 16.4 0.045 8.62e-2 0.15
## 6 IgA p24 0 -6.77 18.5 0.714 2.82e-4 0.818
## 7 IgA p24 6.5 15.9 17.9 0.374 3.31e-2 0.68
## 8 IgA p24 12 25.2 20.8 0.226 6.30e-2 0.452
## 9 IgG Con.6.gp1… 6.5 6.48 17.2 0.706 2.44e-6 0.818
## 10 IgG Con.6.gp1… 12 5.36 15.9 0.736 7.92e-5 0.818
## 11 IgG gp41 0 10.1 15.4 0.51 1.19e-4 0.774
## 12 IgG gp41 6.5 29.7 9.75 0.002 1.76e-1 0.01
## 13 IgG gp41 12 31.7 4.85 0 1.66e-1 0
## 14 IgG gp70_B.Ca… 6.5 3.24 18.5 0.861 2.06e-3 0.89
## 15 IgG gp70_B.Ca… 12 36.1 6.73 0 3.27e-2 0
## 16 IgG p24 0 -14.4 17.9 0.421 1.01e-1 0.702
## 17 IgG p24 6.5 21.7 16.0 0.175 1.44e-1 0.389
## 18 IgG p24 12 32.0 17.0 0.06 2.08e-1 0.171
## 19 IgG ZM96.gp140 6.5 8.40 13.8 0.542 4.73e-4 0.774
## 20 IgG ZM96.gp140 12 32.2 4.71 0 8.80e-2 0
conciseImmuneAlphaTableGaus <- immuneAlphaTableGaus %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTableGaus <- conciseImmuneAlphaTableGaus %>%
mutate(
Coef = cell_spec(format(Coef, digits = 3), "html",
bold = ifelse(P < 0.05, T, F),
italic = ifelse(P < 0.05 & Coef < 0, T, F),
background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
R2 = cell_spec(ifelse(R2 < 0.01, format(R2, digits = 2),round(R2, digits = 2)) , "html"),
P = cell_spec(format_round(P, 3), "html",
bold = ifelse(P < 0.05, T, F),
background = ifelse(P < 0.05, 'yellow', '')
),
Q = cell_spec(format_round(Q, 3), "html",
bold = ifelse(Q < 0.2, T, F),
background = ifelse(Q < 0.2, 'lightyellow', '')
)
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))
toAlphaTableGaus %>% kable("html", escape = F, digits = 3, align = 'c') %>%
kable_styling("striped", "hover", full_width = F) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> conciseAlphaTableGaus.html
conciseAlphaTableGaus.html %>% cat(file = 'tables/conciseAlphaTableGaus.html')
conciseAlphaTableGaus.html
| Type | Antigen | Month | Coef | R2 | P | Q |
|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 30.50 | 0.13 | 0.016 | 0.064 |
| 12.0 | 27.49 | 9.9e-04 | 0.094 | 0.235 | ||
| IgA | gp41 | 0.0 | -6.51 | 0.03 | 0.718 | 0.818 |
| 6.5 | 2.15 | 0.06 | 0.890 | 0.890 | ||
| 12.0 | 32.88 | 0.09 | 0.045 | 0.150 | ||
| p24 | 0.0 | -6.77 | 2.8e-04 | 0.714 | 0.818 | |
| 6.5 | 15.87 | 0.03 | 0.374 | 0.680 | ||
| 12.0 | 25.20 | 0.06 | 0.226 | 0.452 | ||
| IgG | Con.6.gp120.B | 6.5 | 6.48 | 2.4e-06 | 0.706 | 0.818 |
| 12.0 | 5.36 | 7.9e-05 | 0.736 | 0.818 | ||
| gp41 | 0.0 | 10.14 | 1.2e-04 | 0.510 | 0.774 | |
| 6.5 | 29.66 | 0.18 | 0.002 | 0.010 | ||
| 12.0 | 31.71 | 0.17 | 0.000 | 0.000 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 3.24 | 2.1e-03 | 0.861 | 0.890 | |
| 12.0 | 36.15 | 0.03 | 0.000 | 0.000 | ||
| p24 | 0.0 | -14.39 | 0.1 | 0.421 | 0.702 | |
| 6.5 | 21.71 | 0.14 | 0.175 | 0.389 | ||
| 12.0 | 31.96 | 0.21 | 0.060 | 0.171 | ||
| ZM96.gp140 | 6.5 | 8.40 | 4.7e-04 | 0.542 | 0.774 | |
| 12.0 | 32.21 | 0.09 | 0.000 | 0.000 |
Richness vs alpha above, but this time colorcoding by group.
psN1richP_gp41m6 <- psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb, fill = as.factor(medcode_hl(IgG_gp41_Month_6.5)))) + geom_point(shape = 21, size = 4) + geom_errorbar() + scale_y_log10() + scale_fill_viridis_d(direction = -1, name = 'gp41 Primary Timepoint') + labs(x = "MDS1", y = "Richness (# ASVs)") + cowplot::theme_cowplot()
psN1richP_gp41m6
Summary figure for of alpha
Combined figure
par <- options()
#options(repr.plot.width=6, repr.plot.height= 11)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(psN1richMDSP, psN1richP_gp41m6, ncol = 1, labels = c("A", "B"), label_size = 24, align = "v")
g
cowplot::save_plot('figures/richnessMDS1Gp41.png', g, base_width = 6, base_height = 8)
This is in response to a reviewr comment. Strategy, write a function to, for one variable of interest, compare the groups, as I did for unifrac distance. I’ll use a simple logistic regression here. Similar to CapVar. I want a use.immune data set that includes whether people are a donor as a column, but that
immune.data %>%
mutate(isDonor = pub_id %in% muDoners) %>%
filter(
(type == 'IgG' &
antigen %in% ants1 &
month %in% c(6.5,12)
) |
(type %in% c('IgG', 'IgA') &
antigen %in% ants2 &
month %in% c(0,6.5,12)
) |
type == 'CD4+' &
antigen == 'ANY.ENV.PTEG' &
month %in% c(6.5, 12)
)-> allparticipants.immune
head(allparticipants.immune)
## # A tibble: 6 x 14
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 9 CTRL IgA gp41 177. 214. 0 182 6.5 CTRL
## 2 12 CTRL IgA gp41 173 214. 0 364 12 CTRL
## 3 9 CTRL IgA p24 966 454. 0 182 6.5 CTRL
## 4 12 CTRL IgA p24 1419 454. 0 364 12 CTRL
## 5 9 CTRL IgG Con.6.… 1 1 0 182 6.5 CTRL
## 6 12 CTRL IgG Con.6.… 1 1 0 364 12 CTRL
## # ... with 4 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>,
## # isDonor <lgl>
allparticipants.test <- allparticipants.immune %>% filter(visitno ==9, antigen == "Con.6.gp120.B",type == "IgG")
head(allparticipants.test)
## # A tibble: 6 x 14
## visitno rx_code type antigen mag mag_bl response day month ct
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 9 CTRL IgG Con.6.… 1 1 0 182 6.5 CTRL
## 2 9 T2 IgG Con.6.… 14938. 1 1 182 6.5 T
## 3 9 T1 IgG Con.6.… 30458. 1 1 182 6.5 T
## 4 9 T4 IgG Con.6.… 29971. 1 1 182 6.5 T
## 5 9 T3 IgG Con.6.… 31292. 7.75 1 182 6.5 T
## 6 9 T4 IgG Con.6.… 17209 362. 1 182 6.5 T
## # ... with 4 more variables: response_j <dbl>, assay <chr>, pub_id <dbl>,
## # isDonor <lgl>
donorTest <- function(x, transformation = medcode2, family = 'binomial'){
loc_glm <- glm(transformation(mag) ~ isDonor, data = x, family = family)
loc_glm %>% broom::tidy() %>% filter(term == 'isDonorTRUE') %>% dplyr::select(-term)
}
donorTest(allparticipants.test)
## # A tibble: 1 x 4
## estimate std.error statistic p.value
## <dbl> <dbl> <dbl> <dbl>
## 1 0.380 0.506 0.751 0.453
Do on everything
allparticipants.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, visitno) %>%
do(data.frame(donorTest(.))) %>%
pass-> DonorEffectTable
DonorEffectTable
## # A tibble: 20 x 7
## # Groups: type, antigen, visitno [20]
## type antigen visitno estimate std.error statistic p.value
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.ENV.PTEG 9 9.53e- 2 0.520 1.83e- 1 0.855
## 2 CD4+ ANY.ENV.PTEG 12 -1.17e-16 0.526 -2.22e-16 1.000
## 3 IgA gp41 2 -3.89e- 1 0.512 -7.60e- 1 0.447
## 4 IgA gp41 9 -1.72e- 1 0.518 -3.33e- 1 0.739
## 5 IgA gp41 12 -2.75e- 1 0.526 -5.23e- 1 0.601
## 6 IgA p24 2 1.29e- 1 0.509 2.54e- 1 0.799
## 7 IgA p24 9 3.65e- 1 0.521 7.00e- 1 0.484
## 8 IgA p24 12 -2.32e-16 0.524 -4.44e-16 1.000
## 9 IgG Con.6.gp120.B 9 4.05e- 1 0.523 7.76e- 1 0.438
## 10 IgG Con.6.gp120.B 12 1.33e- 1 0.516 2.58e- 1 0.797
## 11 IgG gp41 2 3.57e- 1 0.513 6.95e- 1 0.487
## 12 IgG gp41 9 -1.35e- 1 0.519 -2.59e- 1 0.795
## 13 IgG gp41 12 1.33e- 1 0.516 2.58e- 1 0.797
## 14 IgG gp70_B.CaseA_V1_V2 9 4.05e- 1 0.523 7.76e- 1 0.438
## 15 IgG gp70_B.CaseA_V1_V2 12 -1.33e- 1 0.516 -2.58e- 1 0.797
## 16 IgG p24 2 -1.64e- 1 0.510 -3.22e- 1 0.747
## 17 IgG p24 9 -3.92e- 2 0.528 -7.43e- 2 0.941
## 18 IgG p24 12 -4.01e- 1 0.520 -7.72e- 1 0.440
## 19 IgG ZM96.gp140 9 9.53e- 2 0.520 1.83e- 1 0.855
## 20 IgG ZM96.gp140 12 -4.42e- 1 0.521 -8.47e- 1 0.397
Ok. There appears to be no significant difference in magnitude group for any category.
allparticipants.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, visitno) %>%
do(data.frame(donorTest(., transformation = function(x){jac_box_cox_2(x)},
family = 'gaussian'))) %>%
pass-> DonorEffectGaus
Monkeys, I guess its debug aclock.
DonorEffectGaus
## # A tibble: 20 x 7
## # Groups: type, antigen, visitno [20]
## type antigen visitno estimate std.error statistic p.value
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 CD4+ ANY.ENV.PTEG 9 0.441 0.256 1.72 0.0897
## 2 CD4+ ANY.ENV.PTEG 12 0.0673 0.265 0.254 0.800
## 3 IgA gp41 2 -0.105 0.255 -0.413 0.681
## 4 IgA gp41 9 0.236 0.259 0.911 0.366
## 5 IgA gp41 12 0.349 0.260 1.34 0.184
## 6 IgA p24 2 0.392 0.252 1.56 0.124
## 7 IgA p24 9 0.274 0.258 1.06 0.293
## 8 IgA p24 12 0.313 0.261 1.20 0.234
## 9 IgG Con.6.gp120.B 9 -0.0707 0.261 -0.271 0.787
## 10 IgG Con.6.gp120.B 12 0.0298 0.260 0.115 0.909
## 11 IgG gp41 2 0.268 0.254 1.05 0.295
## 12 IgG gp41 9 -0.113 0.261 -0.435 0.665
## 13 IgG gp41 12 -0.0472 0.260 -0.182 0.856
## 14 IgG gp70_B.CaseA_V1_V2 9 0.185 0.260 0.710 0.480
## 15 IgG gp70_B.CaseA_V1_V2 12 -0.207 0.258 -0.799 0.427
## 16 IgG p24 2 0.114 0.256 0.447 0.656
## 17 IgG p24 9 0.242 0.264 0.916 0.363
## 18 IgG p24 12 0.00988 0.260 0.0380 0.970
## 19 IgG ZM96.gp140 9 0.0605 0.262 0.231 0.818
## 20 IgG ZM96.gp140 12 -0.115 0.260 -0.442 0.660
Same deal when I do a normal logistic regression.
#save.image(file = "workspace.Rdata")